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APPENDIX  E 

OPTICAL  SYSTOLIC  ARRAY  PROCESSING  USING  A  NOVEL  INTEGRATED 
ACOUSTO-OPTIC  MODULATOR  MODULE 

(TSAI  ASSOCIATES) 


APPENDIX  F 

POLYNOMIAL  CONVOLUTION  ALGORITHM  FOR  MATRIX  MULTIPLICATION 
WITH  APPLICATION  FOR  OPTICAL  COMPUTING 


(HARVARD  UNIVERSITY) 


I .  INTRODUCTION 

^This  report  covers  a  very  diverse  ^multi-year*  effort  to 
explore  and  develop  the  role  of  optical  computing  for  SDI 
purposes.  ^Part  of  this  effort  was  through  subcontractors 
whose  final  Reports  are  separately  appended.  Other  parts  of 
this  work,  werte  involved  in  efforts  to  unify  and  publicize 


the  activity  or,  SDI  in  optical  computing.  We  believe  this 
effort  was  important  in  counteracting  the  assertions  made  by 


disgruntled  scientists  in  other  fields  that  SDI  funding  was 
only  for  "mediocre  scientistsT0^  The  effort  was  primarily  r~ 
in  two  fields:  Optical  Algebra  and  Massive  Parallel 
Holographic  Interconnection.  'In  addition  to  that,'1  there  was 
work  on  a  variety  of  other  activities  such  as  pattern 
recognition,  optical  interconnection,  and  low  energy  optical 
computing.  -.This  report  will  attempt  to  organize,  capsulize, 
and  comment  upon  those  various  activities.  In  addition,  we 
include  some  of  the  relevant  technical  documents  as 
appendixes  in  order  to  provide  more  detail  for  those  who  wish 
to  have  it.  Finally,  we  offer  a  program  wrap-up  which 
demonstrates  quite  conclusively  that  the  effort  under  this 
contract  was  not  only  fruitful,  but  also  generative  of 


considerable  current  and  future  activity.  Thus,  this  program 
has  planted  seeds  that  will  lead,  in  a  significant  degree,  to 
the  accomplishment  of  the  original  goal  of  making  optical 
computing  useful  for  SDI  and  for  America. 
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II.  OPTICAL  ALGEBRA 


Optics  has  been  suggested  for  algebra  for  many  years, 
because  there  are  geometry  in  configurations  which  permit  it 

,  A 

to  be  done  extremely  rapidly.  Before  our  work,  the  major 
problem  with  optical  algebra  wa^>  that  high  accuracy  was 

essentially  unobtainable^  One  of  the  primary  goals  of  our 

. 

V  <r  .  ' 

program  was  to  show  that  could  use  low  accuracy  optics  for 
the  computationally  intense  part  of  algebraic  computations 
and  bootstrap  the  accuracy  with  moderately  high  accuracy 
digital  electronics  in  very  simple,  hard-wired 
configurations.  Every  technical  library  has  many  shelves 

I  — - 

full  of  books  on  niftherical  algebra.  All  of  them  assure  the 
reader  that  low  accuracy  processors  are  worthless  in 
obtaining  even  moderately  accurate  results  for  any  realistic 
problem  and  that  for  ill-conditions  or  singular  equation 
sets,  low  accuracy  processors  are  worthless.  If  we  believe 
the  results  of  the  great  mathematicians,  who  wrote  those 
books,  it  is  clear  that  optical  algebra  is  doomed  unless  it 
is  possible  to  somehow  change  the  rules  or  change  the 
problems.  There  follows  an  account  of  exactly  how  we  did 
that. 

According  to  various  estimates,  somewhere  between  50% 
and  75%  of  all  CPU  time  in  the  United  States  is  spent  in 
solving  some  sort  of  linear  algebra.  Examples  include  least 
squares  analysis,  antenna  beam  steering,  linear  regression, 
computational  fluid  dynamics,  finite  element  analysis,  or 
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simply  N  linear  equations  with  N  unknowns. 

Other  nonlinear  algebra  problems  are  also  important. 
These  include  image  processing,  linear  programming,  and  super 
resolution. 

To  the  extent  that  optics  can  solve  such  problems  in  a 
parallel  fashion,  it  can  lead  to  small  fast  processors  which 
would  greatly  improve  the  utility  of  trackers,  radar,  sonar, 
etc. 


WHAT  IS  THE  CURRENT  STATUS? 


We  want  to  solve  problems  like 


2  x  1  +  3  x  2  +  x  3  —  4 
3x1  +  x2+3Xj  =  2 
3x1  +  4x2  +  7xj=1 
We  can  represent  these  generally  as 


A  x  =  b. 


In  this  case 


The  matrix  A  and  the  vector  b  are  given.  We  seek  the  vector 

x. 
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There  is  a  way  to  assign  a  single  number  (a  "norm") 
to  vectors  and  matrices.  We  normally  use  the  Euclidean  norm, 
e.g. 


II  x  ||  = 


2  2  2 
X,  +  x2  +  x3 


*5 


The  word  "solve"  has  two  different  meanings.  We 
presume  there  is  a  "true"  answer  xT.  We  can  say  we  have  an 
e  -  accurate  solution  if 

||  x  -  xT  ||  <  e. 

A  weaker  sense  of  "solve"  is 

||  b  -  A  x  ||  <  e. 

This  is  weaker  in  the  rough  sense  that  some  good  solutions  in 
this  sense  may  not  be  close  to  xT.  On  the  other  hand,  for 
many  problems,  this  "low  residual"  solution  is  perfectly 
adequate.  The  Bimodal  Optical  Computer  (BOC)  minimizes  the 
residual . 

One  speaks  of  computational  complexity  in  terms  of  how 
something  scales  with  some  resource.  We  will  speak  of 
spatial  and  temporal  complexity.  We  will  represent  an  N  x  N 
matrix  in  parallel  using  N2  numbers.  We  say  the  spatial 
complexity  scales  on  the  order  of  N2,  written  0(N2)  .  We  will 
show  that  the  temporal  complexity  is  0(1) ,  i.e.,  independent 
of  N,  provided  that  N  is  small  enough  to  be  represented 
spatially  in  our  processor. 
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The  most  basic  concepts  are  over  a  century  old  (due  to 
Lord  Kelvin) . 

(1)  We  use  a  fast,  low-accuracy  processor  to 
obtain  a  first  guess  xQ, 

(2)  We  use  a  slow,  accurate  processor  to  evaluate 
the  residual  rQ  =  b  -  A  Xq, 

if  II  r0  ||  <  e,  stop. 

(3)  Otherwise,  use  the  low  accuracy  solver  to 
solve  for  A  xQ  =  r0.  If  we  could  solve  that 

problem  jurately,  then  x,  =  x0  +  A  x0  would 

have  zero  residual.  Thus, 

A  X,  =  A  (XQ  +  A  XQ) 

=  A  Xq  +  A  A  Xq 

*  A  x0  +  r0 

=  A  Xq  +  b  -  A  Xq 

=  b. 

(4)  Use  the  slow,  accurate  processor  to  evaluate 
r1  =  b  -  A  x, ,  if  1  r1  ||  <  e,  stop.  Otherwise 
go  to  (3) . 

Some  algebra  problems  resist  accurate  solution  more  than 
others.  In  high  school,  we  solved  N=2  problems  graphically. 
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The  solution  is  x1  Cs),  x2  Cs).  Problems  like  this  are  said  to 
be  "well  conditioned"  and  are  quite  rare  in  real  life.  A 
more  common  case  is 


Such  problems  are  said  to  be  "ill  conditioned."  If  the  lines 
are  parallel,  we  say  A  is  "singular."  Let  us  now  make  this 
somewhat  more  rigorous.  Let  us  define  a  "condition  number" 

X  (A)  =  II  A  ||  *  ||  A'1  ||  . 

Then 

«  (II  x  II)  =  X  (A)  e  (P)  , 

where 

e  (||  x  ||)  =  relative  error  in  the  result  and 
e  (P)  =  relative  accuracy  of  the  processor. 
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If  we  have  e  (P)  =  0.1  (very  good  optics)  and  %  (A)  =  10 

(wonderfully  benign  problem) , 
e  (||  x  ||)  =  1, 
i.e.,  100%  errors  are  likely. 

This  is  why  we  go  to  32  bit  floating  point  electronics.  No 
one  wants  an  answer  accurate  to  one  part  in  232  (-  4  x  109)  . 
We  need  that  to  get  meaningful  answers  for  large  %.  The 

ultimate  ill-conditioning,  singularity,  corresponds  to 
infinite  *.  Such  problems  are  common. 

In  roughly  1985,  Caulfield  showed  that  this  iterative 
process  converges  (roughly)  if 


e  (P)  < 


1 

2*  (A)  ' 


For  good  optics,  e  (P)  =0.1.  Thus  we  need 
X  (A)  <  5 

to  guarantee  solution.  This  is  silly.  No  real  problems  are 
so  benign. 

In  1987,  we  showed  that  replacing  A  by  a'  =  A  +  E  where 
E  is  an  error  matrix  and 

||  E  ||  /  «  A  ||  |  «  1, 

leads  to  convergence  for  all  problems  independently  of  *. 

For  large  x>  the  x  which  minimizes  ||  r  ||  may  be  less  close 
to  x,.  than  would  be  the  case  for  small  *.  Nevertheless,  we 

can  drive  ||  r  ||  to  zero  in  very  few  iterations  even  for 
singular  matrices.  Call  this  Breakthrough  1. 

To  do  the  fast,  low-accuracy  solution  O  (1)  in  time;  we 
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To  do  the  fast,  low-accuracy  solution  0  (1)  in  time;  we 
use  another  trick.  We  employ  a  parallel  A  x  =  y  device. 


->Y, 


These  are  easy  in  optics.  Wai  Cheng  and  Caulfield  showed 
that  if  we  correct  xk,  with  a  signal  proportional  to  bk  -  yk, 
for  all  k,  then  this  system  would  "relax  from  any  starting  x 
to  one  satisfying  A  x  =  b  (in  the  low  ||  r  ||  sense)  under  the 
circumstance  that  A  is  "positive  definite."  To  explain  this, 
we  need  one  more  diversion. 

A  vector  e  such  that 
A  e  =  A  e, 

where  A  is  a  scalar,  is  said  to  be  an  "eigenvalue."  Let  us 

arrange  the  eigenvalues  of  A  such  that 
A,  <  A2  <  •  •  •  <  Ar 

(r  connotes  "rank,"  a  concept  we  choose  not  to  define  here). 
Interestingly, 

X  (A)  =  1  Ar  1/ |A,  |  . 
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The  interesting  thing  for  our  purposes  is  that  the  relaxation 
processor  converges  at  a  rate  (roughly)  of 

e  'x  1 
e  1  . 

Obviously  if  A,  >0,  it  does  not  converge.  Here  t  is 

normalized  by  the  round  trip  time  in  the  system.  A  matrix 
for  which  A,  >0,  is  said  to  be  positive  definite.  A  matrix 


B  = 


12 

34 


can  undergo  a  row-for-column  switch  to  form  a  transpose 


Bt  = 


13 

24 


Since  the  matrix  elements  may  be  complex,  we  can  complex 
conjugate  a  matrix  A  to  get  A*.  We  call 
(A*)t  =  (at)*  =  Ah, 

the  Hermit ian  of  A.  For  any  matrix  both  AAH  and  AHA  are 
nonnegative  definite  (A,  >  0)  .  We  noted  that  AHA  +  E  and 

AAh  +  E  are  positive  definite  if  E  >  0. 

Note,  though, 

A  x  =  b 
AhA  x  =  Ah  b. 

Write 

B  =  AhA 

and 

c  =  AH  b. 
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Then 


B  x  =  c 

and  B  is  nonnegative  definite  (likewise  for  AAH)  .  Applying 
our  method  to  this  makes  all  methods  converge  even  though 
A  (AhA)  =  A  (AAh)  =  A2  (A)  , 

a  normally  disastrous  event.  These  realizations  are 
Breakthrough  2. 

Many  other  things  done  in  BOCs  are  pretty,  but  those  two 
are  essence.  Of  the  two,  Breakthrough  1  is  essential. 
Breakthrough  2  allows  0  (1)  solutions. 


SUMMARY 


CONVENTIONAL 
ALGEBRA  ON 
DIGITAL  COMPUTERS 


•  SEEKS  ||  x  -  xT  ]|  <  e 

•  REQUIRES  ROUGHLY 
O  (N3)  TEMPORAL 
COMPLEXITY 

•  ALGORITHM  MATCHED 
TO  PROBLEM 

•  £  (II  X  I)  a  x  (A) 


•  E  (||  x  ||)  a  E  (P) 


BIMODAL 

OPTICAL 

COMPUTERS 


•  SEEKS  ||  b  -  A  x  ||  <  e 

■  O  (1)  TEMPORAL 
COMPLEXITY 


•  CONSTANT  ALGORITHM 
SUFFICES 

■  ||  b  -  A  x  ||  ->  o 

INDEPENDENTLY  OF  x(A) 

•  ||  b  -  A  x  ||  <  e 
INDEPENDENTLY  OF  E  (P) 


The  highlights  of  this  period  include  a  laboratory 
demonstration  of  an  0  (1)  time  solver  of  even  singular  matrix 
equations  and  the  first  vigorous  mathematical  proof  of  how 
this  works.  Appendix  A  gives  those  details. 


10 


In  the  appendix,  we  show  papers  from  optics  journals  and 
mathematics  journals  giving  in  mathematical  detail  the  proof 
in  illustration  that  these  concepts  are  workable.  In  terms 
of  applications  to  SDI,  these  might  range  from  signal 
processing  (where  constrained  linear  equations  lead  to  fast 
image  restoration)  to  phased  array  radar  (where  the  magnitude 
of  jammer  signals  is  essentially  irrelevant  and  processor 
speed  is  independent  of  the  number  of  elements  in  the  radar) . 
While  IBM  is  working  on  the  approach  we  developed  as  a 
possible  electronic  product,  Nodal  Systems  Corporation  is 
planning  on  investing  tens  of  millions  of  dollars  to  develop 
this  technology  as  an  optical  algebra  processor.  That 
processor  would  be  able  to  operate  on  very  large  (tens  of 
thousands  in  each  dimension)  algebraic  problems  and  achieve 
high  accuracy  even  for  ill-conditioned  systems  at  very  high 
speed. 

III.  MASSIVE  PARALLEL  INTERCONNECTION 

Before  this  program,  what  was  meant  by  massive  parallel 
interconnection  was  the  connection  of  each  element  of  a  one 
dimensional  optical  input  to  each  element  of  a  one 
dimensional  optical  output.  If  both  input  and  output  had 
dimensionality  N,  then  there  were  N2  parallel,  weighted 
optical  interconnections.  It  was  argued  that  this  offered  an 
advantage  over  electronics.  The  argument  may  well  be  correct 
for  a  large  N,  but  it  is  not  altogether  certain.  To  achieve 
an  indisputable  advantage  for  optics  over  electronics,  we 
sought  to  connect  a  N  x  N  input  array  to  a  N  x  N  output  array 
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using  N4  parallel  weighted  interconnections.  For  large  N 
(100  to  1000),  the  number  of  parallel  weighted 
interconnections  is  significantly  more  than  can  ever  be 
accomplished  with  electronics.  Let  us  try  to  justify  that 
statement  by  considering  connecting  a  1000  x  1000  array  of 
electrical  signals  to  a  1000  x  1000  array  of  other  electrical 
elements  using  wires.  By  this  conceptual  design,  we  will 
allow  ourselves  22nd  century  technology.  For  instance,  we 
will  assume  that  the  interconnections  can  be  made  with 
submicron  diameter  wires  such  that  the  wires  plus  insulators 
are  only  one  micron  in  diameter.  This  means  that  the  full 
set  of  1012  wires  could  fit  in  a  very  small  cross  sectional 
area  of  only  one  meter  by  one  meter.  Actually,  of  course, 
that  is  not  the  case.  The  reason  a  much  larger  area  will  be 
required  is  that  the  wires  must  be  scrambled  and  criss-cross 
one  another.  If  we  are  very  clever,  perhaps  we  could  fit 
that  into  a  two  meter  by  two  meter  area.  The  length  of  that 
bundle  of  wires  would  have  to  exceed  the  width.  We  can 
optimistically  assume  that  the  length  of  the  interconnection 
bundle  will  be  only  four  meters.  Thus,  the  whole 
interconnection  can  fit  in  a  package  only  two  meters  by  two 
meters  by  four  meters.  SDI  will  not  fly  this  in  the  head  of 
a  missile,  but  the  assembly  of  that  many  wires  can  at  least 
be  done.  Some  technology,  not  known  to  us,  must  be  used  to 
set  the  resistances  of  the  various  wires  (the  weights) . 

Since  we  don't  know  what  that  technology  is,  we  will  not 
explain  it  here.  The  interaction  among  currents  in  those 
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various  wires  will  be  very  severe.  This  will  vastly  increase 
both  the  delays  (which  will  be  quite  variable  among 
interconnections)  and  the  required  power.  Let  us  again 
invoke  future  technological  wonders  and  assume  those  problems 
can  be  made  to  vanish  or  be  negligible.  Then  the  sole 
remaining  problem  is  simply  to  bond  the  1012  submicron  wires 
to  the  appropriate  bonding  pads.  Again  invoking  21st  century 
technology,  we  assume  a  bonding  machine  can  be  made  that 
makes  1000  perfect  bonds  of  submicron  wires  to  appropriate 
bonding  pads  every  second.  Such  a  bonding  machine  would  be  a 
great  marvel  indeed,  but  it  would  have  to  operate 
continuously  for  four  years  just  to  hook  up  the  system.  Of 
course,  we  have  neglected  the  question  of  how  one  might  check 
out  such  a  system.  Nevertheless,  these  considerations 
suggest  that,  for  all  practical  purposes,  such 
interconnection  is  impossible  with  electronics. 

BASIC  CONCEPT 

Fig.  2  shows  the  basic  concept  schematically.  The  input 
is  a  two  dimensional  array  of  modulators  (A  Spatial  Light 
Modulator  or  SLM) .  In  this  drawing,  it  is  shown  as  a 
transmissive  SLM.  In  other  cases,  it  can  reflective.  The 
two  dimensional  output  array  can  be  thought  of  as  detectors, 
bi-stable  optical  devices,  or  any  other  useful  components.  A 
lens  or  lens  system  images  a  two  dimensional  array  of 
holograms  onto  the  two  dimensional  array  of  outputs  through 
the  SLM.  All  of  the  holograms  are  simultaneously 
illuminated.  The  ij  hologram  is  imaged  onto  the  ij  SLM.  The 
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strength  of  light  from  the  ij  hologram  to  the  kl  element  of 


the  SLM  may  be  called  T(jkl.  The  amount  of  light  transmitted 
through  the  kl  element  depends  on  the  input  which  we  may  call 
akl.  Thus  the  transmitted  light  from  the  ij  hologram  through 
the  kl  element  is  Tjjkl  akl.  The  lens  collects  such 
contributions  over  the  entire  SLM.  That  is,  the  amount  of 
light  arriving  at  the  ij  element  in  the  output  plane  is 

b,j  =  2  Tijkl  akl. 

kl 


We  recognize  this  as  the  ij  element  of  the  product  of  the  two 
dimensional  matrix  A  whose  ij  element  is  a^.  with  the  four 
dimensional  tensor  whose  ijkl  component  is  Tijkl.  Rewriting 
this  in  more  compact  form,  we  have 
B  =  TA. 

A  very  detailed  analysis  of  the  potientiality  and  limitations 
of  this  technique  may  be  found  among  the  references  in 
Appendix  B. 

APPLICATIONS 

Numerous  applications  of  this  technology  can  be  found. 
The  one  developed  especially  for  this  program  was  massively 
parallel  cellular  array  processors.  This  is  discussed  in  the 
appendix.  Numerous  other  applications  are  obvious  and  have 
begun  to  be  discussed  in  the  literature.  Perhaps  the  most 
obvious  is  optical  neural  networks.  Other  applications  are 
generalized  Hough  transforms  and  digital  optical  computers. 
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generalized  Hough  transforms  and  digital  optical  computers. 

In  another  portion  of  the  overall  ONR/SDI  program,  Peter 
Guilfoyle  and  co-workers  made  significant  improvements  in  the 
digital  optical  computer  concepts  first  described  by 
Morozov*.  If  we  are  willing  to  simple  pre  and  post  processing 
of  input  data,  we  can  generalize  this  technique  to  become  a 
general  purpose  optical  computer.  This  work  has  attracted 
world  wide  attention  and  numerous  citations.  In  addition, 
both  the  neural  network  aspects  and  the  digital  optical 
computing  aspects  are  being  pursued  at  multi-million  dollar 
levels  by  Nodal  Systems  Corporation.  Again,  the  program  has 
done  its  job  of  stimulating  an  entire  new  area  (Appendix  C) . 
IV .  OTHER  DEVELOPMENTS 

An  important  early  paper  of  this  program  was  on  optical 
Fredkin  gates.  Since  the  publication  of  that  paper,  this 
work  has  diverged  into  two  directions.  First,  there  has  been 
considerable  work  in  extending  the  two  dimensional  Fredkin 
gate  array  to  three  dimensions.  This  appears  to  have  some 
real  advantages  over  prior  technology.  Second,  this  work  has 
led  (under  other  sponsorship)  to  the  realization  that  optics 
can  accomplish  what  computer  theorist  have  been  dreaming  of 
for  last  15  years:  the  performance  of  computing  operations 
at  less  than  kT  per  operation.  All  of  these  matters  are 
discussed  in  substantial  detail  in  appendix  C. 


H.  E.  Elion  and  V.  N.  Morozov,  “Optoelectronic  Switching 
Systems  in  Telecommunications  and  Computers,"  Marcel 
Dekker,  N.Y.  (1984). 
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There  was  important  early  work  in  making  optical  pattern 
recognition  filters  that  not  only  had  the  invariance 
properties  which  are  being  gladly  sought  but  also  the 
property  of  being  very  easy  to  fabricate.  This  work  has  led 
to  considerable  progress.  We  are  now  at  the  point  where 
immensely  powerful  optical  pattern  recognition  mask  can  be 
designed  and  fabricated  in  a  very  simple  way. 

Finally,  there  was  some  preliminary  work  on  how  these 
concepts  apply  to  optical  neural  networks. 

These  areas  are  expanded  upon  in  Appendix  D. 

V.  CONCLUSIONS 

A  variety  of  totally  new  concepts  were  introduced  and 
established  as  feasible  during  the  course  of  this  contract. 
Each  of  them  is  a  subject  of  intense  continuing  research 
around  the  world.  Many  of  them  are  now  being  pursued 
commercially  in  America  and  will  undoubtedly  find  their  way 
into  the  SDI  effort  of  our  country.  In  addition,  the  massive 
commercial  applications  anticipated  and  funded  as  a  result  of 
work  reported  here  constitute  an  outstanding  example  of  the 
usefulness  of  the  SDI  program  in  creating  new  technology  of 
broad  general  usefulness  for  America. 
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APPENDIX  A 


BIMODAL  OPTICAL  COMPUTERS 


The  first  paper  in  this  field  (Appl.  Opt.  25,  3128)  was 
completed  before  the  beginning  of  this  contract.  It  showed  (as 
described  in  the  main  text  of  our  report)  the  severe  limitations  in 
principle  on  Bimodal  Optical  Computer  (BOC)  convergence. 

The  first  work  under  this  contract  (Opt.  Eng.  26,  22)  showed 
that  in  many  cases  convergence  occurred  even  when  it  could  not 
be  guaranteed. 

The  two  key  papers  showed  how  to  get  convergence  for  all 
matrices  (Appl.  Opt.  26,  4906)  and  why  this  method  works 
(Linear  and  Multilinear  Algebra  25,  215).  The  experimental 
demonstration  followed  immediately  (SPIE  936,  315). 

Extending  this  to  new  algebra  problems  like  eigen  problems 
(SPIE  634.  86)  and  nonlinear  algebra  (SPIE  936.  309)  increased 
the  utility. 

The  ultimate  SDI  application  is  jam  resistant  high  speed  radar 
array  data  processing  (Microwave  and  Optical  Technology  Letters 
1,236). 


I 
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Bimodal  optical  computers 

H.  John  Caulfield,  John  H.  Gruninger,  Jacques  E.  Ludman,  K.  Steiglitz,  H.  Rabitz,  J.  Gelfand,  and  E.  Tsoni 


Analog  optical  solutions  of  numerical  problems  tend  to  be  fast,  simple,  and  inaccurate.  Digital  optical  or 
electronic  solutions  to  the  same  problems  tend  to  be  slower,  harder,  and  more  accurate.  In  circumstances 
outlined  here,  hybrid  analog-digital  systems  can  be  built  which  give  the  accuracy  of  digital  solutions  with 
intermediate  degrees  of  speed  and  simplicity.  Because  at  any  instant  these  processors  are  working  in  either 
the  analog  or  the  digital  mode,  we  call  them  bimodal  optical  computers. 


I.  Introduction 

While  optical  digital  computers  have  been  drawing 
great  attention,1*7  it  is  only  in  analog  computation  that 
optics  is  known  to  excel  over  electronics.  In  this  paper 
we  offer  a  limited  exploration  of  a  proposed  link  be¬ 
tween  these  two  Helds  of  optics.  That  is,  we  will  dis¬ 
cuss  hybrid  optical  numerical  processors.  We  seek  the 
numerical  accuracy  of  digital  computing  while  still 
retaining  some  of  the  speed  and  power  advantages  of 
analog  optics.  To  do  this  we  must  mix  analog  optics 
with  digital  electronics  (or  electrooptics  or  optics)  to 
bootstrap  the  accuracy.  W e  call  this  hybrid  a  bimodal 
optical  computer. 

While  some  of  these  concepts  are  new  to  optics, 
many  are  not  new  to  science  in  general.  Our  purpose 
in  this  paper  is  to  call  the  attention  of  optics  workers  to 
this  area.  We  will  present  a  general  approach  and 
then  specialize  to  one  very  specific  and  simple  prob¬ 
lem:  Linear  algebraic  equations.  The  method  is 
clearly  extendable  to  nonlinear  problems  and  other 
linear  problems. 

II.  Generic  System 

The  generic  system  is  comprised  of  three  properly 
interacting  systems:  an  optical  analog  solver  of  the 
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basic  problem;  a  memory;  and  an  accurate  (digital  or 
hybrid)  calculator  of  the  solution  accuracy.  The  basic 
cycle  is  as  follows: 

calculate  an  approximate  solution  with  the  optical 
analog  processor; 

remember  that  solution  to  high  accuracy; 

calculate  the  solution  accuracy  with  the  accurate 
computer; 

repose  the  problem  as  an  error  reduction  problem; 

solve  with  an  optical  analog  processor; 

using  the  just-calculated  improvement  and  the 
stored  prior  solution,  calculate  and  remember  the  im¬ 
proved  solution  with  the  accurate  computer; 

calculate  the  solution  accuracy  with  the  accurate 
computer; 

if  the  solution  is  accurate  enough,  stop; 

if  not,  recycle. 

Clearly,  the  convergence  condition  is  that  the  error 
be  reduced  in  each  iteration.  If  this  is  the  case,  as  we 
will  show,  the  optical  analog  processor  no  longer  limits 
solution  accuracy. 

In  a  purely  digital  system,  the  primary  consumer  of 
space,  weight,  power,  time,  and  cost  would  be  the  solv¬ 
er  (direct  or  iterative)  of  the  problem  solved  by  the 
relatively  small,  low-weight,  power  conservative,  fast, 
and  inexpensive  optical  analog  processor.  Thus  there 
is  the  potential  for  significant  overall  system  improve¬ 
ment  using  this  hybrid  approach. 

There  are  two  major  forms  the  accurate  processor 
can  take.  First,  it  can  be  a  special  purpose,  fast,  inex¬ 
pensive  digital  processor.  For  reasons  which  will  soon 
become  evident,  we  call  the  hybrid  system  involving 
such  a  processor  a  mathematical  problem  solver.  Sec¬ 
ond,  the  accurate  processor  could  be  a  physical  system 
interacting  with  the  world.  The  problem  is  then  iso¬ 
morphic  with  the  control  theory.  We  call  such  a  pro¬ 
cessor  a  physical  problem  solver.  With  a  mild  effort, 
the  reader  should  become  convinced  that  these  two 
problem  solvers  use  the  same  mathematics. 
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HI,  Accuracy  Analysis 

We  will  examine  the  bimodal  optical  computer 
(BOC)  with  specific  emphasis  on  linear  algebra  as 
might  be  used,  for  example,  for  numerical  solution  of 
partial  differential  equations.  The  generic  BOC 
method  was  originally  proposed  by  Thompson6  some 
time  ago  for  iteratively  improving  the  precision  of  me¬ 
chanical  devices  which  were  used  for  the  simultaneous 
solution  of  linear  equations.  This  method  appears  to 
provide  some  considerable  benefit  for  situations  where 
a  low-accuracy  but  fast  device  is  available  for  provid¬ 
ing  approximate  solutions  to  partial  differential  equa¬ 
tions.  This  can  then  be  linked  to  a  higher  accuracy 
device  which  is  particularly  well  suited  for  forward 
substitution  of  the  approximate  solution  into  the  origi¬ 
nal  equation.  The  BOC  iterative  scheme,  besides  hav¬ 
ing  been  proposed  by  Lord  Kelvin,6  is  a  standard  nu¬ 
merical  approach  to  the  iterative  solution  of  linear 
systems  and  has  solution  of  linear  systems  and  has 
been  analyzed  with  respect  to  numerical  round-off 
error  by  Wilkinson7  and  Stewart6  among  others9.  A 
working  model  of  this  analog  and  digital  bimodal  elec¬ 
trical  computer  has  also  been  constructed  by  Kar- 
plus.10  This  work  reexplores  and  extends  the  prior 
work  and  incorporates  modem  linear  and  nonlinear 
optical  computer  techniques. 

We  can  summarize  this  idea  in  the  following  way. 
Suppose  we  want  to  solve  the  n-dimensional  linear 
system  of  equations. 


Ai»h. 


(1) 


Here  A  is  a  given  matrix,  b  is  a  given  vector,  and  x  is  the 
sought-after  solution  vector. 

These  problems  are  of  great  interest  in  their  own 
right.  In  addition  such  systems  with  high  dimensions 
arise  when  linear  partial  differential  equations  are 
solved  by  the  finite  difference  method.  Many  other 
problems  can  be  recast  in  this  form.  Suppose  further 
that  we  have  built  a  discrete  optical  analog  processor 
for  this  problem  which  gives  an  approximate  solution 
that  can  be  summarized  with  the  equation 


b, 


(2) 


where  A  and  6  differ  from  A  and  b  because  of  the 
limited  accuracy  of  the  analog  components.  We  now 
have  an  approximate  solution  to  our  problem  x,  which 
typically  is  accurate  to  a  few  percent.  Next,  we  use  a 
digital  electronic  computer  to  form  the  residual 


1  b  -  Ax 


(3) 


using  the  actual  high-precision  versions  of  A  and  b. 
Notice  that  this  step  entails  only  substitution  of  the 
current  solution  x  in  the  modal  equations,  a  relatively 
fast  operation  for  even  a  modest  digital  computer. 
Subtracting  Eq.  (3)  from  Eq.  (1)  with  digital  electron¬ 
ics,  we  can  write 


A(x  -  x)  -  r  -  0 

call  the  current  solution  error 


(4) 


and  write  Eq.  (4)  as 


x  -  X  *  Ax, 


A(Ax)  =  r. 


(5) 


(6) 


We  now  have  a  problem  of  the  same  form  as  the  origi¬ 
nal  with  A  being  the  same  matrix,  except  with  the 
inhomogeneity  term  b  replaced  by  the  residual  vector 

r. 

We  now  want  to  use  the  analog  optical  computer 
again  to  estimate  Ax  and  refine  the  solution  x,  but  we 
first  scale  the  equations  by  an  appropriate  number  S  to 
bring  the  voltages  and  currents  back  to  the  levels  in  the 
first  solution.  Thus  we  solve 


A  y  =  Sr 

and  then  use  the  estimate 

Ax  «  y/S 

to  refine  the  current  solution  to 

x  -  x  +  Ax. 


(7) 


(8) 


(9) 


This  process  can  be  iterated  and  in  favorable  condi¬ 
tions  will  converge  quickly  to  solutions  of  accuracy 
only  by  the  digital  computer  representation  of  A,  b, 
and  the  digital  computation  of  Eq.  (3).  The  descrip¬ 
tion  above  for  the  iterative  procedure  was  given  in 
terms  of  a  linear  equation;  however,  this  concept  may 
also  be  applied  to  nonlinear  systems  and  would  take 
advantage  of  the  unique  capacity  of  nonlinear  analog 
circuits  for  the  solution  of  the  nonlinear  algebraic 
equations  of  the  discretized  system.  An  analysis  simi¬ 
lar  to  the  above  treatment  will  again  apply  since  the 
equations  become  quasi-linear  near  the  true  solution. 

We  might  call  this  a  floating-point  analoe  computa¬ 
tion  where  the  scaling  parameter  S  acts  a  radix, 
varying  from  stage  to  stage  with  the  size  of  the  residu¬ 
als  in  the  equations.  We  note  that  this  technique  is 
quite  similar  to  the  very  standard  iterative  numerical 
methods,  such  as  Newton’s  method.  In  addition,  we 
see  that  this  technique  marries  analog  and  digital  com¬ 
puters  in  a  most  congenial  way — we  take  advantage  of 
the  speed  and  highly  parallel  nature  of  the  analog 
system  as  well  as  the  memory  and  high  precision  of  the 
digital  system  in  the  external  loop. 

We  have  examined  the  stability  and  convergence 
properties  of  the  iteration  process  for  this  BOC.  To 
first  order  we  can  model  the  error  caused  by  solving  the 
system  on  an  analog  computer  (Eq.  2)  by  [Ref.  8,  Corol¬ 
lary  (3.7)] 


(A  +  E)-‘  -  (I  -  F)A_1, 


110) 


where  E  is  the  error  in  the  matrix  due  to  the  analog 
representation.  The  norm  of  F  is  bounded  by 

*<A)('IE:I/:IAiI) 


IF'I  s 


1  -  [*<A)i|E!!/llA:;| 


111) 


!|  •  ||  is  a  matrix  norm,  and  the  condition  number  of  A  is 
defined  by 


k(A)  -  i Ail  -  iA_li. 
Substituting  Eq.  10  into  Eq.  6  gives 


(12) 
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**_,  -  xk  -  i„  -  **  -  (I  -  F  )A*‘(b  -  Ax»).  (13) 

Letting  x*  -  A-1b  be  the  exact  solution,  we  can  rear¬ 
range  this  to  yield 

**_,  -  x*  -  -Hi*  -  **).  U4) 

and  taking  the  norms  of  both  sides, 

|x*,-e*I<|FM«*-*M.  (15) 

We  thus  have  a  sufficient  condition  for  geometric 
convergence  of  the  process,  namely,  ||F||  <  1,  which  is 
satisfied  if 


where &(  A)  is  the  condition  number  of  A,  and  [|E||  is  the 
error  in  the  analog  representation  of  the  true  matrix  A. 
Of  course,  when  convergence  takes  place,  the  errors  in 
the  digital  computation  may  ultimately  overtake  the 
effect  of  the  analog  error  that  is  modeled  here,  al¬ 
though  the  effects  of  analog  noise  may  prevent  that 
kind  of  ultimate  accuracy. 

Since  ||E||/||  A||  may  be  ~0.01  for  optical  analog  pro¬ 
cessors,  Eq.  (16)  requires  that  k(A)  <  50.  This  is  quite 
restrictive  but  perhaps  quite  pessimistic.  Simple 
equilibration  of  rows  may  change  A  to  A'  with 

fc(A)' «  k(A). 

Furthermore,  a  variety  of  other  mathematical  tricks 
can  be  performed.  We  can  replace  Eq.  (9)  with 

*  *  i  +  OAk  (17) 

and  seek  to  use  the  convergence  factor  0  to  force  con¬ 
vergence  in  analogy  with  stochastic  approximation. 
We  can  replace  Eq.  (6)  by 

A  +  pqr(Ax)  -  r,  (18) 

where  q  is  chosen  orthogonal  to  Ax  and  p  is  a  free 
vector  so  that  A  and  pqr  are  of  the  same  dimensional¬ 
ity  as  A.  Calling 

A"  »  A  +  pqr,  (19) 

we  seek  p  values  to  make 

k(A')  «  *<A).  (20) 

iy.  mnuncar  proowms 

Perhaps  the  most  important  payoff  with  BOCs  may 
be  associated  with  the  solution  of  nonlinear  problems. 
Many  physical  phenomena  result  in  nonlinear  differ¬ 
ential  or  ultimately  algebraic  equations  for  solution. 
Such  problems  are  notoriously  difficult  to  treat  by 
conventional  numerical  methods  on  digital  computers. 
This  comment  follows  since  the  algorithms  will  involve 
linearization  or  perhaps  iteration  with  convergence 
being  slow  or  perhaps  nonexistent  in  highly  nonlinear 
problems.  A  more  suitable  approach  would  be  based 
on  directly  building  the  nonlinear  behavior  into  the 
calculation  process.  It  appears  possible  to  construct 
hybrid  machines  based  on  this  logic  following  lines 
parallel  to  that  discussed  in  Sec.  III.  The  key  to  this 
approach  rests  on  the  fact  that  nonlinear  electronic  or 


optical  elements  can  be  readily  made  and  integrated 
together  into  an  overall  nonlinear  computer. 

As  a  simple  example  of  a  nonlinear  problem  we  may 
consider  the  search  for  roots  of  a  polynominal  p(x)  in 
the  real  variable  x.  It  is  straightforward  to  use  optical 
methods  to  evaluate  polynomials  via  Horner’s  rule. 
Optical  polynomial  evaluation  can  be  analog11  or  digi¬ 
tal.  Some  tricks  to  accommodate  dynamic  range,  al¬ 
low  root  searching  by  scanning,  extend  the  range  of 
problems  addressed,  etc.  are  given  in  the  latter  refer¬ 
ence.  Root  searching  for  real  roots  simply  by  scanning 
through  x  and  watching  for  p(x)  =  0  conditions  is 
straightforward  and  fast.  It  is,  however,  not  likely  to 
be  highly  accurate.  Suppose  we  identify  an  approxi¬ 
mate  real  root  t0.  We  can  then  evaluate  p(x o)  and 

p,(x)  -p(x)-p(x0) 

digitally.  Assuming  we  are  now  close  to  the  true  root, 
we  can  now  change  the  scale  of  both  p\  and  x  to  gain 
sensitivity.  We  might  substitute  y  =*  lOx  and  q i  = 
10p;  and  then  search  <?i(y)  as  before.  This  leads  to  a 
better  approximation  Xi  as  can  be  verified  by  digital 
evaluation  of  p(xi).  Accuracy  is  limited  by  the  condi¬ 
tion  number  of  the  polynomial  because  that  limits  the 
accuracy  of  the  polynomial  evaluation.  Other  similar 
examples  can  be  found,  and  a  general  set  of  logic  can  be 
set  forth  as  discussed  below. 

A  nonlinear  computer  of  the  type  discussed  in  the 
first  paragraph  could  likely  be  of  limited  accuracy  but 
capable  of  achieving  an  extremely  rapid  solution  with¬ 
out  the  introduction  of  artificial  linearization  or  itera¬ 
tion  algorithms.  The  machine  could  be  used  alone  or 
incorporated  into  an  overall  hybrid  device  along  the 
lines  discussed  in  Sec.  Ill  and  the  polynomial  root 
searching  example.  This  would  entail  introducing  a 
high-accuracy  digital  computer  as  a  means  of  monitor¬ 
ing  residual  errors.  Updated  corrections  to  the  origi¬ 
nal  fully  nonlinear  solution  could  be  achieved  by  again 
using  the  nonlinear  solution  if  it  is  close  enough  to  the 
true  answer  that  the  nonlinear  computer  effectively 
operates  in  the  linear  mode  after  the  first  cycle.  As  an 
alternative  it  would  be  possible  to  construct  an  addi¬ 
tional  linearized  version  of  the  machine  for  the  accura¬ 
cy  updates  on  the  solution.  These  approaches  may  be 
theoretically  modeled  as  well  as  demonstrated  in  the 
laboratory,  and  we  plan  to  carry  out  such  studies  in  the 
future. 


V.  Conclusions 

Analog  optics,  when  adequate  for  a  task,  is  usually 
superior  in  speed,  size,  power  consumption,  and  cost  to 
all  competitors.  What  we  have  suggested  here  is  a 
means  to  extend  the  set  of  situations  for  which  analog 
optics  is  adequate.  Many  studies  remain  to  be  per¬ 
formed  on  both  algorithms  and  hardware.  Neverthe¬ 
less,  the  general  concept  of  a  hybrid  system  appears  to 
be  extremely  promising. 

Work  sponsored  primarily  by  the  U.S.  Army  Re¬ 
search  Office  under  contract  DAAG-29-84-C-0026. 
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Optical  monitoring  of  weld  penetration 

A  system  is  being  developed  to  monitor  weld  penetration  optically  and 
produce  a  signal  for  controlling  an  arc  welder.  The  system  is  aimed  at 
automatic  welders,  robot  welders  in  particular.  Made  from  small,  low-coat 
componenta  and  utilizing  optical  fibers  to  conduct  the  signals,  the  system  is 
immune  to  the  electromagnetic  interference  that  is  common  in  industrial 
environments. 

The  monitor  directs  collimated  light  from  a  small  diode  laser  at  the  molten 
pool  of  metal  bene-  -  the  arc  (see  Fig.  16).  A  filter  intercepts  the  reflected 
beam  to  suppress  >  raneoua  light,  including  light  from  the  welding  arc.  A 
position-sensitive  detector  at  a  distance  from  the  pool  intercepts  the  beam 
reflected  by  the  pool. 

[f  the  weld  penetrates  the  workpiece  completely,  the  curvature  of  the  pool 


Fig.  16.  Bounding  off  the  meniscus  of  a  pool  of  molten  metal,  a  laser 
beam  impinges  on  a  position-sensitive  photodetector.  The  beam 
diameter  can  be  adjusted  for  the  width  of  the  weld.  Optical  filters 
screen  out  the  light  from  the  arc. 


surface  suddenly  changes.  This  causes  a  sudden  deflection  of  the  reflected 
light  beam,  and  consequently  a  displacement  of  the  beam  spot  on  the  detector. 
Signal-processing  circuitry  determines  the  amplitude  and  rate  of  beam  dis¬ 
placement  to  detect  penetration  and  to  generate  control  signals  for  the  robot 
to  regulate  welding  parameters. 

The  monitor  is  insensitive  to  changes  in  weld  current,  welder  speed,  and  the 
thermal  properties  of  the  welded  metal  except  as  they  affect  weld  penetration. 
The  monitoring  principle  is  adaptable  to  other  types  of  welding,  including 
tungsten/inert-gaa,  laser,  and  electron-beam  techniques. 

This  work  was  done  by  Jonathan  Mar  am  of  Rockwell  International  Corp.  for 
Marshall  Space  Flight  Center.  Refer  to  MFS-29107. 


High-flux  atomic-oxygen  source 

A  proposed  apparatus  can  generate  high  fluxes  (about  101S  atoms/cm2-s>  of 
ground  state  (3r)  oxygen  atoms.  The  kinetic  energy  would  be  variable  in  the 
range  of  3-10  eV,  ana  the  beam  would  be  free  of  contaminants,  such  as  ions, 
metaatable  'Sot'D  oxygen  atoms,  or  other  neutral  species.  Designed  specifi¬ 
cally  to  study  the  degradation  of  materials  and  spacecraft  glow  phenomena  in 
low  earth  orbits,  this  oxygen-atom  beam  source  could  be  used  to  study  gas- 
phase  collision  phenomena  involving  energetic  oxygen  atoms. 

In  the  proposed  source  (see  Fig.  17)  electrons  are  generated  at  a  heated 
filament  of  LaB<  or  W.  Bias  voltages  Vx  and  V-  accelerate  the  electrons  to  the 
proper  energy  (6.5  eV)  to  maximize  the  dissociative  attachment  of  a  beam  of  O? 
gas  (that  is,  the  separation  of  O2  molecules  into  0~  ions).  A  solenoids! 
magnetic  field  provided  by  superconducting  coils  contains  the  electrons  e  and 
the  ions  0~  produced  in  the  dissociative-attachment  process. 


Fig.  17.  Accelerated  electrons  strike  a  beam  of  O2  gas  in  the  disso¬ 
ciative-attachment  region,  producing  O'  ions.  The  O'  ions  are 
accelerated  to  the  desired  final  energy  and  pass  through  the  photo¬ 
detachment  region  to  form  0(3P)  atoms.  These  pass  between  elec¬ 
tric  field  plates  to  remove  O'  and  e  and  then  strike  the  target. 

eenmme  or  mgs  JJSS 
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Abstract.  A  bimodal  optical  computer  (BOC)  for  solving  a  system  of  linear 
equations  is  presented,  the  BOC  can  achieve  accuracies  comparable  to  those  of 
the  digital  computer,  and  its  speed  is  far  superior  in  solving  a  system  of  linear 
equations.  The  advantage  in  speed  increases  with  the  size  of  the  matrix.  The 
problem  of  the  convergence  of  the  solution  using  the  BOC  is  investigated.  It  is 
found  that  by  using  a  BOC  with  an  error  as  higij  as  50%  in  the  matrix’s  optical 
mask  and  1%  in  the  electro-optical  devices,  convergence  is  achieved  for 
matrices  with  condition  numbers  of  25.  The  effect  of  the  condition  number  on 
the  convergence  of  the  solution  is  investigated.  It  is  found  that  matrices  with 
large  condition  numbers  converge  very  slowly.  Convergence  for  matrices  with 
condition  numbers  higher  than  250  was  achieved.  A  means  of  improving  the 
condition  number  of  a  matrix  is  also  introduced. 

Subject  terms:  optical  computing  and  nonlinear  optical  signal  processing :  numerical 
processors  matrix  processors  optical  hybrid  processors  convergence,  algebra 
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1.  INTRODUCTION 

Analog  optics  is  very  attractive  for  signal  processing  and 
computing  because  of  its  ability  to  process  two-dimensional 
data  in  parallel  very  rapidly.  Unfortunately,  this  high  speed 
parallel  processing  achieves  only  low  accuracy  because  of  the 
nature  of  the  analog  processing,  especially  in  optical  systems, 
where  accuracy  problems  arise  from  errors  in  writing  and 
reading  the  signals  using  the  I/O  electro-optical  devices.  In 
contrast,  digital  electronics  is  much  slower  but  much  more 
accurate.  A  compromise  ( hybrid)  system,  the  bimodal  optical 
computer,  appears  to  be  intermediate  in  both  speed  and  accu¬ 
racy.  This  method,  introduced  by  Caulfield  et  al.1  and  de¬ 
scribed  in  Sec.  2,  combines  the  high  speed  and  parallelism  of 
analog  optics  with  the  high  accuracy  of  digital  electronics 
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using  Lord  Kelvin's  iterative  method.2  In  Sec.  3  we  present  a 
numerical  analysis  for  the  convergence  of  the  solution  of  a 
system  of  linear  equations.  In  Sec.  4  we  present  computer 
simulations  of  the  BOC  to  study  the  dependence  of  the  solu¬ 
tion  convergence  on  the  condition  number  of  the  matrix  and 
on  the  errors  in  representing  the  LO  data  in  the  optical 
system.  In  Sec.  5  we  compare  the  time  required  to  solve  a 
system  of  linear  equations  using  the  BOC  to  that  required  by 
the  digital  computer.  In  Sec.  6  a  means  of  reducing  the  condi¬ 
tion  number  of  a  matrix  is  examined,  and  in  Sec.  7  conclu¬ 
sions  and  final  remarks  are  drawn. 

2.  BIMODAL  OPTICAL  COMPUTER  ALGORITHM 

The  bimodal  optical  computer  works  in  the  following  manner 
for  solving  a  system  of  linear  equations: 

Ax  =  b  .  (I) 

where  A  is  an  n  X  n  matrix  and  x  and  b  are  n  X  I  vectors.  A  and 
b  are  given.  The  x  is  unknown  and  is  computed  as  follows: 

(a)  Use  the  optical  analog  processor  to  compute  an  approx¬ 
imate  solution  Xq  of  the  linear  system.  The  subscript  zeros 
indicate  inaccuracies  in  the  optics  and  electronics,  so  the 
system  of  equations  solved  by  the  optical  analog  processor  is 

A0*o  =  Ik,  .  (2) 

(b)  Store  the  solution  x„  to  a  high  accuracy  with  the  digital 
computer.  Use  a  dedicated  digital  processor  to  calculate  the 
residue 

r  =  b  -  Ax,  =  Alx  -  Xo)  =  A  Jx  (3) 


i 
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Fig.  1 .  System  layout  of  tha  bimodal  optical  computer. 


Fig.  2.  Spactrai  radius  S(M)  as  a  function  of  the  standard  deviation 
of  tha  error  matrix. 


(c)  Use  the  optical  analog  processor  to  solve  the  new  linear 
system  for  Ax: 

AoJ  =  sr0  .  (4) 

where  y  =  s  Ax  and  s  is  a  ‘"radix,”  or  scale  factor,  chosen  to 
make  good  use  of  the  dynamic  range  of  the  system. 

(d)  Use  the  digital  processor  to  refine  the  solution  Xq  for  x,: 

x,  =  Xo  +  Ax  .  (5) 

If  the  refined  solution  x,  is  accurate  enough,  terminate  the 
iterations.  Otherwise,  return  to  (b),  (c),  and  (d)  for  a  more 
refined  solution. 

3.  CONVERGENCE  OF  THE  SOLUTION 


X(A)  =  II  A|| 


A-1  II 2 


> 


Ul, 


(8) 


where  lXlmax  and  l\lmin  are  the  maximum  and  minimum 
eigenvalues  of  the  matrix  A.  The  equality  is  satisfied  if  A  is  a 
symmetric  positive  definite  matrix.  The  condition  number  is  a 
measure  of  the  accuracy  of  the  Ax  =  b  solutions.  The  larger 
the  condition  number,  the  less  accurate  the  result  achieved 
with  any  fixed-accuracy  computer. 

From  Eq.  (2)  the  solution  x„  is  given  by 

.  (9) 


where  B„  =  ( A,,)'1  .  and  if  x,  is  the  solution  after  the  \th  itera¬ 
tion,  then 


Figure  I  is  a  block  diagram  of  the  BOC.  The  solution  of  the 
linear  algebraic  equation  will  be  computed  optically  using  the 
method  introduced  by  Cheng  and  Caulfield.3  The  heart  of  the 
processor  is  the  fully  parallel  Stanford  matrix-vector  multi¬ 
plier  4  Input  lights  representing  x  components  are  spread 
vertically  onto  the  columns  of  an  attenuating  mask  represent¬ 
ing  A.  Row  sums  of  the  transmitted  light  are  detected  to  give 
components  of  the  output  vector  z.  For  all  k,  we  allow 

\  =  -  zk  (6) 

to  drive  xk.  Here,  z*  is  a  component  of  the  calculated  z  =  Ax. 

The  convergence  of  the  solution  of  the  problem  depends  on 
two  factors:  the  convergence  of  the  solution  of  the  system 
given  in  Eq.  (2)  by  the  analog  processor  and  the  convergence 
of  the  solution  for  the  system  given  by  Eq.  ( l )  by  the  optical- 
hybrid  processor  ( BOC).  The  convergence  of  the  solution  of 
Eq.  (2)  is  discussed  by  Cheng  and  Caulfield.  They  report  that 
if  the  matrix  is  a  positive  definite  (a  matrix  with  positive 
eigenvalues),  then  the  solution  will  converge  regardless  of  the 
size  of  the  matrix.  This  simply  applies  to  step  (a)  of  the 
procedure  outlined  in  Sec.  2. 

We  turn  next  to  the  total  process,  presenting  a  numerical 
analysis  for  the  convergence  of  the  solution  and  its  depen¬ 
dence  on  the  condition  number  of  the  matrix.  The  condition 
number  of  the  matrix  A  is  defined  as 

x<A)  =  ||  A||  -  ka-'h  .  (7) 

where  the  double  bars  denote  the  norm  of  the  matrix.  If  we 
consider  the  Euclidean  norm,  then 


x,  „  ,  =  x,  +  -\x  .  (10) 

where  Ax  is  given  by 

^x  =  B0r  .  (II) 

Therefore, 

*,  +  ,  =  (I  -  B0A)x,  +  .  (12) 

where  I  is  the  identity  matrix.  The  condition  for  the  conver¬ 
gence  of  the  solution  given  in  Eq.  (12)  is  that5 

S(M)  <  1  .  (13) 

where  M  =  I  —  B„A  and  S(M)  is  the  spectral  radius  of  the 
matrix  M.  which  is  equal  to  the  absolute  value  of  the  maxi¬ 
mum  eigenvalue  \mia(M)of  the  matrix  M.  Representing  the 
matrix  A  with  an  optical  mask  (a  photographic  film  or  an 
SLM)  is  the  major  source  of  the  error.  We  need  to  examine 
how  accurate  this  mask  should  be  to  achieve  solution  conver¬ 
gence.  Let  us  represent  the  mask's  matrix  by 

Ao  =  A  +  E  .  (14) 

where  E  is  an  error  matrix.  For  simulations,  E  is  generated  by 
a  Gaussian  random  number  generator  with  a  standard  devia¬ 
tion  aE.  In  Fig  2  the  spectral  radius  S(  M)  of  the  matrix  M  is 
plotted  versus  the  standard  deviation  oE  of  the  error  matrix 
for  a  matrix  A  with  a  maximum  coefficient  of  unity  (any 
matrix  can  be  normalized  to  take  this  form).  It  is  clear  that  the 
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spectral  radius  increases  as  aE  increases,  which  slows  the  rate 
of  convergence.  The  interesting  result  is  that  for  this  particular 
matrix,  convergence  is  achieved  even  with  an  error  matrix  of 
>50%. 

4.  COMPUTER  SIMULATIONS 

In  this  section  we  present  a  computer  simulation,  using  the 
BOC,  of  the  procedure  outlined  in  Sec.  2.  In  the  simulation  we 
consider  matrix  masks  with  different  accuracies.  We  also 
consider  the  accuracy  of  the  LEDs  and  photodiodes  to  be  1% 
in  writing  and  reading  the  data.  Here,  we  are  interested  in 
finding  the  number  of  iterations  required  for  the  solution  to 
converge  to  a  preset  accuracy. 

4.1.  Condition  number  effects 

The  condition  number  is  a  measure  of  the  sensitivity  of  the 
solution  of  Eq.  (1)  to  any  variations.  In  the  first  part  of  the 
simulation,  we  tested  the  condition  number  and  its  sensitivity 
to  the  error  matrix  of  the  optical  mask.  In  Fig.  3  the  condition 
number  of  the  mask's  matrix  is  plotted  as  a  function  of  the 
standard  deviation  of  the  error  matrix,  aE.  The  maximum 
coefficient  of  the  matrix  A  is  kept  equal  to  unity.  In  Fig.  3(a) 
we  consider  a  matrix  with  a  condition  number  equal  to  60. 
This  curve  shows  that  with  the  increase  of  the  error  in  repre¬ 
senting  the  matrix  by  an  optical  mask,  the  condition  number 
improves,  except  for  a  very  few  points  where  the  error  values 
caused  the  condition  number  to  increase.  In  Fig.  3(b)  a  matrix 
with  *(  A)  =  300  is  considered.  Here,  for  the  entire  range  of 
aE,  the  condition  number  of  the  mask’s  matrix  is  much  smaller 
than  300.  This  interesting  result  shows  that  if  we  start  with  an 
ill-conditioned  matrix,  its  mask  can  be  well-conditioned.  This 
will  help  in  solving  problems  in  which  the  matrix  is 
ill-conditioned. 

In  testing  the  effects  of  the  condition  number  on  the  con¬ 
vergence  of  the  solution  of  the  system  of  linear  equations,  we 
used  the  BOC  to  solve  the  system  with  a  16  bit  resolution.  The 
matrices  were  generated  randomly  using  Gaussian  statistics. 
An  error  of  1  %  of  the  maximum  coefficient  of  the  matrix  was 
added  to  generate  the  mask.  An  error  of  1%  also  was  used  in 
reading  Xq  and  in  writing  1^,.  In  each  case  we  computed  the 
condition  number  of  the  generated  mask’s  matrix.  The 
number  of  iterations  required  for  convergence  of  the  solution 
was  determined  for  each  case.  The  iterations  were  terminated 
if  they  exceeded  25  and  also  if  |rk|/|rk  _  ,|  >  1 ,  which  is  the 
condition  of  a  solution  divergence.  The  number  of  iterations 
required  for  convergence  of  the  solution  with  16  bit  accuracy 
is  plotted  as  a  function  of  the  condition  number  of  the  mask's 
matrix  in  Fig.  4.  In  these  calculated  data  points  it  is  quite 
evident  that  the  number  of  iterations  increases  with  the 
increase  of  the  condition  number,  which  is  a  predicted  result.4 
The  increase  of  the  condition  number  decreases  the  accuracy 
of  the  solution,  so  more  iterations  are  needed  to  achieve  the 
desired  accuracy.  From  Fig.  4  convergence  was  achieved  for 
condition  numbers  as  high  as  230,  and  even  with  x(  A)  >  1000 
in  our  experiments,  convergence  was  achieved  for  some  cases. 

4.2.  Effect  of  the  mask's  error 

The  major  limiting  factor  on  the  speed  of  convergence  is  the 
accuracy  with  which  we  can  represent  the  matrix  with  an 
attenuating  optical  mask.  In  the  present  state  of  the  art. 
accuracies  of  3  to  5%  are  achievable.  We  would  like  to  see  how 
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Fig.  4.  Number  of  iterations  as  a  function  of  tha  condition  number  of 
a  randomly  generated  matrix  A. 


this  will  affect  the  speed  of  convergence  of  the  solution.  In  this 
simulation  we  started  with  a  matrix  with  a  condition  number 
X(  A)  =  24.  Error  matrices  E  were  generated  using  Gaussian 
statistics  with  coefficients  ranging  between  1  and  55%  of  the 
maximum  coefficient  of  the  matrix  A.  The  optical  masks  were 
generated  by  adding  A  to  E.  These  masks  were  then  used  in  the 
BOC  to  solve  the  system  of  linear  equations.  The  number  of 
iterations  required  to  achieve  the  solution  with  the  desired 
accuracy  (16  bit  resolution)  was  computed.  In  Fig.  5  the 
number  of  iterations  is  plotted  as  a  function  of  the  standard 
deviation  of  the  error  matrix.  aE.  The  number  of  iterations 
increases  with  the  increase  of  the  error  in  the  mask,  as 
expected.6  However,  even  with  errors  as  high  as  55%  in 
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Fig.  5.  Number  of  iteration*  a*  a  function  of  th#  standard  deviation 
of  the  error  matrix. 
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data. 
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iterations  for  *(A)  =  13.  26.  42.  and  66. 


representing  the  matrix  A  with  an  optical  mask,  convergence 
is  still  retained.  Of  course,  a  larger  number  of  iterations  is 
required.  This  result  is  very  important.  It  means  that  even  with 
optics  that  are  not  so  accurate,  we  can  realize  this  optical 
computer  that  solves  this  class  of  linear  algebra  problems  with 
a  high  accuracy  and  speed. 

Other  observations  recorded  in  this  simulation  need  to  be 
highlighted.  First,  the  condition  number  of  the  mask's  matrix 
A  +  E  is  computed  for  the  set  of  error  matrices  E.  In  Fig.  6  the 
condition  number  of  the  mask's  matrix.  *(  A  +■  E),  is  plotted 
as  a  function  of  the  error  oE.  The  condition  number  of  the 
mask  decreases  with  the  increase  of  the  error  almost  exponen¬ 
tially.  which  is  surprising  since  it  appears  to  contradict  the 
result  shown  in  Fig.  4.  We  showed  there  that  for  large  condi¬ 
tion  numbers  we  need  more  iterations,  while  here  more  itera¬ 
tions  are  needed  for  small  x(A<,).  But.  indeed,  it  is  not  a 
contradiction.  Here,  although  these  matrices  have  low  condi¬ 
tion  numbers,  they  are  very  different  from  the  matrix  A  given 
by  the  system  of  linear  equations  because  of  the  large  error 
involved,  which  makes  the  convergence  very  slow. 

Second,  we  found  in  the  results  of  the  simulation  that  if  the 
condition  number  of  the  mask  increases  for  a  large  error,  the 
solution  will  diverge,  because  the  solution  obtained  in  each 
iteration  has  high  inaccuracies.  This,  in  return,  makes  the 
convergence  either  very  slow  or  not  achievable. 

Finally,  we  note  the  relationship  between  the  spectral 
radius  S(M)  of  the  matrix  M  given  by  Eq.  ( 1 3)  and  the  number 
of  iterations.  We  computed  S(  M)  for  each  mask  of  Fig.  5.  The 


number  of  iterations  required  tor  a  solution  with  an  error  e  is 
given  by5 

N  =  _!28<2>  .  (l5> 

R(M)  131 

where  R(M)  is  the  asymptotic  rate  of  convergence. 

R(M)  =  —  log[S(M)]  .  (16) 

In  Fig.  7  the  number  of  iterations  required  to  get  a  solution 
with  a  16  bit  accuracy  (c  =  I/:16)  is  plotted  versus  the  spectral 
radius  S(M).  Equation  (IS)  is  plotted  as  a  continuous  line, 
while  the  data  computed  in  the  simulation  are  plotted  as 
squares.  The  theoretical  and  experimental  data  agree  well.  As 
S(M)  increases,  the  number  of  iterations  increases,  and  as 
S(M)  approaches  unity,  the  convergence  becomes  very  slow. 
For  values  of  S(  M)  larger  than  unity  the  solution  will  diverge. 

4  J.  Rate  of  convergence 

So  far  we  have  considered  solutions  with  a  16  bit  accuracy.  We 
are  interested  in  determining  how  many  more  iterations  are 
needed  to  get  a  higher  accuracy  of  the  solution.  In  Fig.  8  the 
natural  logarithm  of  the  maximum  component  of  the  residue 
I  r  I  is  plotted  as  a  function  of  the  number  of  iterations  for  a  set 
of  matrices  with  different  condition  numbers.  x(A)  =  13.26. 
42.  and  65.  The  smaller  the  condition  number,  the  higher  the 
accuracy  achieved  in  fewer  iterations.  For  the  condition 
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number  1 3,  one  iteration  can  increase  the  accuracy  by  as  many 
as  5  bits. 

5.  COMPUTATIONAL  SPEED  ANALYSIS 

To  get  some  quantitative  values  for  the  speed  of  this  process 
compared  to  that  of  the  digital  computer,  we  calculate  the 
number  of  operations  required  by  each  method,  then  multiply 
the  result  by  the  time  required  for  each  operation.  We  con¬ 
sider  the  total  number  of  operations  regardless  of  whether 
they  are  multiplications  or  additions. 

Let  us  consider  an  nXn  matrix  A.  The  time  required  for 
one  iteration  of  the  procedure  outlined  in  Sec.  2,  T0( ,  is  given  by 

T0,  =  TAI  +  2n(n  +  1 ) T0,  ,  (17) 

where  TAI  is  the  time  required  to  solve  AqXq  =  by  analog 
optics  and  TD|  is  the  time  required  for  one  digital  operation. 
Therefore,  the  time  required  to  do  1^  iterations  with  the  BOC  is 

T0  =  WTAI +2n(n+ 1)T0I]  ,  (18) 

while  the  time  required  by  the  digital  computer  to  solve  the 
system  of  linear  equations  using  Cholesky’s  method1  in  1D 
iterations  is 

T0  ~(~  ^dTqi  ■  (19) 

The  condition  we  need  to  satisfy  in  order  to  have  an  advantage 
in  time  in  using  the  BOC  over  the  digital  computer  is 

T0  «  Td  (20) 

Therefore,  fora  clear  advantage  of  the  BOC,  from  Eqs.  ( 18)  to 
(20)  we  want 


2*0  - - 

2*0  -  f 

“0  -  *  =  1  < 


<  200  - 


0  40  80  <20  '80  200  2  40  280 


Size  of  matrix,  n 

Fig.  9.  Operation  advantage  in  Eq.  (25)  as  a  (unction  of  the  size  of 
the  matrix  for  *  =  1.10.  and  20. 


Here.  A,  is  an  “inherent  advantage.'*  A  single  analog  opera¬ 
tion  is  much  faster  than  a  digital  one.  The  entire  Ax  =  b 
solution  will  be  slower  than  a  single  digital  operation,  but  the 
analog  Ax  =  b  solver  works  at  speeds  independent  of  n.  On 
the  other  hand,  Tm  is  operation  dependent.  It  includes  the 
time  required  for  performing  the  operation  and  storing  and 
retrieving  the  data  from  computer  memory,  which  is  time 
consuming,  especially  as  n  increases. 

The  factor  Ap  is  a  problem-related  advantage.  It  is  a  func¬ 
tion  of  the  size  of  the  matrix,  n.  and  the  ratio  of  the  iterations. 
k.  The  operation  advantage  Ap  is  plotted  in  Fig.  9  as  a  function 
of  n  and  k.  It  is  clear  that  Ap  increases  rapidly  as  n  increases, 
even  if  the  number  of  iterations  in  the  BOC  is  much  larger 
than  in  the  digital  processor,  while  in  reality  the  number  of 
iterations  of  the  two  processes  will  be  approximately  the  same 
for  well-conditioned  matrices. 


!0[TA,  +  2n(n  +  l)T0l]  «  (y  +  2nJ)  lDTc 


*(TAI  +  2n(n  +  1)T0I]  «  (y  +  2nJ)  TDI  .  (22) 

where  k  =  I„/  1D.  Then,  Eq.  (22)  can  be  rewritten  in  the  form 
(nJ'3)  +  2n^l  -x)-2nx  Jhl  >=>  ,  (23) 


The  advantage  in  speed  in  using  the  BOC  over  the  digital 
computer  is  obvious  from  Eq.  (23),  and  it  increases  as  the  size 
of  the  matrix  n  increases.  To  examine  this  condition  carefully, 
let  us  rewrite  Eq.  (23)  in  the  form 

A  A,  »  I  .  (24) 


where 


*  \  = 


2((nJ  6)  +  n2(l  —  «)  —  n«] 


6.  CONDITION  NUMBER  REDUCTION 

As  mentioned  earlier.thecondition  number  isan  indication  of 
how  accurate  the  solution  of  the  system  of  linear  equations 
will  be.  The  larger  the  condition  number  \(A).  the  more 
iterations  are  needed  for  solution  convergence.  One  way  of 
reducing  the  condition  number  of  a  given  matrix  is  to  normal¬ 
ize  the  matrix  in  the  following  manner: 

a,> 

a  ,  =  -r  .  .7  . - T—  •  1  =  1 .2  —  n  .  (27) 


A<  =  T~ 

1  M 


where  thea^  are  the  coefficients  of  the  matrix  A  and  a",  arc  the 
coefficients  of  the  normalized  matrix  An.  The  ratio  of  the 
condition  number  of  the  normalized  matrix  An,  *(  A„),  to  that 
of  the  original  matrix.  *(  A),  is  plotted  in  Fig.  10.  It  isclear  that 
the  normalized  matrix  has  a  smaller  condition  number  than 
the  original  matrix  by  a  factor  of  approximately  0.8.  We 
expect  this  to  decrease  the  number  of  iterations  substantially. 

7.  CONCLUSIONS 

The  speed,  accuracy,  and  convergence  of  the  bimodal  optical 
computer  are  discussed.  The  BOC  is  similar  to  the  digital 
computer  in  its  accuracy  but  is  faster  in  solving  a  system  of 
linear  equations  than  the  digital  computer.  The  speed  advan¬ 
tage  increases  with  the  increase  of  the  size  of  the  matrix,  which 
makes  it  a  more  attractive  computer.  The  convergence  of  the 
solution  as  a  function  of  the  condition  number  and  the  error  in 
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Fig.  10.  Ratio  of  tho  condition  number  of  the  normalized  matrix  to 
that  of  tha  original  matrix  aa  a  function  of  tho  condition  number  of  tho 
original  matrix. 

the  1/  O  devices  is  also  analyzed.  It  is  found  that  solutions 
converge  even  for  about  50%  errors  in  the  representation  of 
the  matrix  by  an  optical  mask.  Although  this  error  will  reduce 
the  speed,  it  will  not  lead  to  a  sacrifice  in  the  accuracy  of  the 
solution.  Thus,  even  with  today’s  inaccurate  analog  optics,  we 
can  have  a  powerful  computer  to  solve  this  class  of  linear 
algebra  problems.  Normalization  of  the  matrix  will  reduce  its 
condition  number,  which  will  lead  to  a  faster  convergence. 

The  BOC  is  capable  of  solving  other  problems,  both  linear 
and  nonlinear,  in  addition  to  the  system  represented  by  Eq.  ( I ). 
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Beginning  with  Lord  Kelvin1  and  continuing  through  re¬ 
cent  work  of  ours,-'3  researchers  have  been  interested  in  ways 
to  use  a  fast  low-accuracy  processor  and  a  slow  high-accuracy 
processor  together  with  intermediate  but  high  speed.  We 
showed  earlier2  that  to  guarantee  convergence  with  a  fast 
processor  of  accuracy  e  ( expressed  as  rms  relative  error)  on  an 
Ax  =  b  problem  in  which  the  condition  number  is  x(A),  the 
requirement  (in  approximate  form)  is 

xl A)  •<<*<>.  (1) 

For  analog  optical  algebra  processors,  <  =  ‘/w  (2%)  is  excel¬ 
lent.  This  means  we  may  expect  some  failures  of  the  process 
for  x<A)  >  25.  Since  many  matrices  have  much  higher 
condition  numbers,  this  is  a  severe  restriction.  In  subse¬ 
quent  publications1*3  we  showed  that  convergence  was 
achievable  for  many  matrices  with  x(A)  »  l/(2«)  and  that  A 
can  be  preconditioned  (rearranged  without  changing  its 
meaning)  to  improve  x(A).  These  steps  brought  the  result¬ 
ing  bimodal  optical  computer  (BOC)  to  the  point  where  it 
was  practical  for  some  real  but  restricted  situations. 

Our  goals  in  this  Communication  are  twofold.  First,  we 
wish  to  remove  some  restrictions  on  the  condition  number 
and  thus  achieve  convergence  for  a  wide  range  of  problems. 
Second,  we  wish  to  remove  the  restriction  we  imposed  on  the 
Ax  M  b  solver  by  limiting  its  convergence  to  only  positive 
definite  matrix  A  and  thus  guarantee  convergence  for  other 
matrices  by  modifying  the  algorithm. 

Although  the  BOC  can  be  applied  to  all  linear  algebra 
problems  we  pick  the  general  Ax  a  b  problem  for  illustration. 
We  review  here  the  basic  ideas: 

(1)  Solve  Ax  •  b  optically  to  get  x<). 

(2)  With  specialized  digital  processor,  evaluate  to  high 
accuracy 

r0»b-Ax0.  (2) 

(3)  Normalize  r«  to  keep  solutions  in  range. 

(4)  Solve  optically 


A(Axo)  “  r.i-  1 3) 

(5)  Evaluate  digitally 

x,*x0  +  AXo  i 

r,  *  b  -  Ax,  i5> 

(6)  If  II  r  1 1|  is  small  enough,  stop.  Otherwise,  go  to  (3)  and 
recycle. 

In  the  optical  steps,  replace  A  with  a  new  matrix  A„  derived 
from  A  by  adding  noise  to  it. 

Ao  *  A  +  E,  (6) 

where  E  is  an  error  matrix  generated  using  Gaussian  statis¬ 
tics  with  a  standard  deviation  <te-  The  new  matrix  will  have 
a  much  better  condition  number,  especially  for  an  ill-condi¬ 
tioned  starting  matrix  A.  The  digital  correction  steps  keep 
the  solution  headed  toward  Ax  =»  b  not  AoX  =  b.  In  analog 
processors  adding  E  is  automatic  because  of  system  noise. 
We  treat  <rE  hereafter  as  the  standard  deviation  of  system 
noise. 

The  method  proved  capable  of  solving  systems  of  linear 
equations  with  a  wide  range  of  condition  numbers.  The 
convergence  to  the  solution  is  very  rapid  for  small  condition 
numbers  and  very  large  condition  numbers  (near  singular 
and  singular)  but  not  as  rapid  in  intermediate  values  of 
condition  numbers.  And  it  works  best  for  singular,  underde¬ 
termined.  or  overdetermined  systems. 

In  Table  I  the  number  of  iterations  .V;  required  for  a 
convergence  to  a  solution  with  16-bit  accuracy  is  tabulated  as 
a  function  of  the  error  involved  in  the  calculations.  These 
results  are  obtained  using  a  computer  simulation  of  the 
bimodal  optical  computer.  The  errors  considered  in  the 
calculation  are  defined  as  follows: 

<7e  -  the  standard  deviation  of  the  error  matrix  E; 
a-f,  =  the  standard  deviation  of  the  error  in  writing  the  vector 
b,  and 

(t x  =  the  standard  deviation  in  reading  x. 

The  results  in  Table  I  are  for  a  singular  matrix  A.  The 
matrices  considered  here  are  10  X  10  and  have  rank  of  9  and 
1,  respectively.  When  an  error  is  added  to  the  vector  b  the 
solution  diverges,  but  by  adding  an  error  matrix  E  to  A  the 
solution  converges  very  rapidly,  as  shown  in  the  table.  This 
is  true  for  different  values  of  the  error  a The  technique 
does  work  even  with  a  processor  with  errors  larger  than  that 
shown  in  Table  I.  Also  we  have  considered  a  set  of  Hilbert 
matrices,6  which  are  very  ill-conditioned,  and  their  condition 
number  increases  very  rapidly  by  the  increase  of  size.  These 
are  used  as  test  matrices  for  our  new  technique  which  was 
able  to  achieve  convergence  very  rapidly,  especially  when 
there  are  errors  in  the  vector  b. 

Thus  this  technique  makes  the  solution  converge  for  a 
system  of  equations  which  cannot  be  solved  with  ordinary 
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Tabto  I.  Convwganc*  Bahavlor  o*  10  X  10  Malrtcyy  o«  Rank i  »  and  1  (or 
Various  Addttlv*  Error* 


ffE 

<*b 

<*I 

10  X  10 

Rank  ■  9 

S,  llr.ll/llr,  II 

10  x  10 

Rank  -  1 

N,  fal/lr,| 

0 

0 

0 

l 

l 

0 

l.E-6 

0 

4 

0.5585 

D 

5.  IE  +  26 

l.E-6 

l.E-6 

0 

1 

3.48-6 

1 

8.9E  -  7 

0 

l.E-6 

l.E-6 

D 

2.3273 

D 

6.7  E  +  26 

l.E-6 

l.E-6 

l.E-6 

1 

9.24-6 

I 

1.2E-  5 

0 

0.01 

0 

D 

1.000 

D 

2.4E  +  64 

l.E-6 

0.01 

0 

4 

2.2E-2 

2 

2.9E  -  3 

0 

0.01 

0.01 

D 

1.1953 

D 

1.2E  +  31 

l.E-6 

0.01 

0.01 

9 

2.1E-1 

5 

6.9E  -  2 

l.E-4 

0.0 1 

0.01 

7 

1.7E-1 

4 

3.6E  -  3 

0.01 

0.01 

0.01 

8 

2.5E-1 

5 

1.3E  -  1 

0.05 

0.01 

0.01 

11 

2.4E-1 

4 

2.0E  -  t 

0.10 

0.01 

0.01 

18 

7.2E-1 

4 

8.6E  -  2 

jV/  is  the  number  of  iterations  required  to  achieve  lrV/i|  »  0  to 
within  16  bits.  The  ratio  of  ;|rj||/llrill  gives  another  measure  of 
convergence  (or  divergence  indicated  by  D)  rate.  We  have  used  the 
infinity  norm  for  convenience. 


techniques.  We  have  tested  this  technique  for  matrices  with 
sizes  N  up  to  12  X  12  and  with  different  ranks  from  (N  -  l)to 
1.  For  all  these  cases  it  does  work  with  a  high  speed  of 
convergence. 

In  previous  papers  we  showed  that  the  parallel  analog 
processor  proposed  for  the  BOC  is  capable  of  solving  only 
systems  of  linear  equation  with  positive  definite  matrices. 
Here  we  will  -ase  this  restriction.  In  general  the  matrix  A 
has  complex  eigenvalues.  If  the  matrix  A  is  multiplied  by 
the  Hermetian  matrix  AH.  the  matrix  AHA  will  have  a  non¬ 
negative  eigenvalue.  Now  multiply  the  system  Ax  *  b  by 
AH;  then 

AHAx  -  AHb.  (7) 

Equation  (7)  gives  a  new  system  of  linear  equation  with  a 
non-negative  eigenvalue.  In  practice  adding  E  to  AHA  re¬ 
sults  in  a  positive  definite  matrix.  Thus  general  systems  can 


be  solved  in  this  manor.  Replacing  A  by  AHA  increases  the 
condition  number  (squaring  at  the  most)  but  does  not  pre¬ 
vent  convergence  to  an  accurate  solution  of  Eq.  (7).  Equa¬ 
tion  (7)  is,  of  course,  not  well  posed  as  Ax  *  b.  Therefore,  we 
must  use  the  residual  of  Eq.  (2)  not  the  residual  of  Eq.  (7). 

With  this  method  most  problems  behave  as  simple  prob¬ 
lems:  they  converge  and  converge  rapidly.  This  has  been 
applied  successfully  to  Ax  *  b  problems,  which  are  deter¬ 
mined,  underdetermined  (linear  programming),  or  overde¬ 
termined. 

The  method  is  purely  algebraic  and  is,  therefore,  simply  an 
improved  approach  to  some  numerical  algebra  problems. 
The  fact  that  it  is  suited  for  BOC  use  is  an  independent  fact. 

If  we  can  achieve  fast  convergence  almost  independently 
of  condition  number,  the  first  practical  application  may  be  to 
phased  array  antennas  where  the  phasing  problems  are  lin¬ 
ear  algebra  and  the  primary  difficulty  is  the  presence  of 
jammers:  malicious  means  to  increase  the  condition  num¬ 
ber. 

This  research  was  supported  by  the  Innovative  Science 
and  Technology  Office  of  the  Strategic  Defense  Initiative 
Organization,  administered  through  the  Office  of  Naval  Re¬ 
search  under  contract  N00014-86-K-0591. 
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Let  Ax  *  b  be  a  consistent  linear  system  with  A  an  n  x  n  complex  matrix.  Suppose  that  G  is  a  nonsingular 
n  x  n  complex  matrix  for  which  A  G  is  nonsingular  and  zero  is  not  a  multiple  root  of  the  minimum 
polynomial  of  G~'A.  It  is  shown  that  there  exists  a  positive  real  number  p  such  that  whenever  y  is  a 
complex  number  with  0  <  |y|  <  p  the  sequence  x0,  x,.  x,.  converges  to  a  solution  of  Ax  =  b  for  every 
initial  vector  x0,  where  (A  ■+■  yGl.x,  =  yG.x, . ,  +■  b  for  t  =  1. 2,  .  .  Related  questions  are  also  considered. 

In  this  note  theoretical  results  are  presented  that  help  explain  the  observed  behavior 
of  a  standard  iterative  process  for  solving  a  linear  system  of  equations. 

Analog  optics  is  very  attractive  for  performing  matrix  computations  because  of  its 
ability  to  process  two-dimensional  data  in  parallel  very  rapidly  [5].  Unfortunately, 
this  high  speed  processing  achieves  only  low  accuracy.  In  contrast,  digital  electronic 
processors  are  slower  but  much  more  accurate.  It  was  recently  suggested  [4]  that 
linear  systems  can  be  solved  iteratively  by  a  method  that  combines  the  speed  of 
analog  optics  with  the  accuracy  of  digital  electronics.  The  proposed  method  is  based 
on  the  usual  iterative  refinement  of  approximate  solutions  of  linear  systems  (for 
example,  see  [6]).  To  solve  a  system  Ax  =  b,  where  A  is  an  n  x  n  matrix,  use  an 
optical  analog  processor  to  find  an  approximate  solution  x  of  Ax  =  b. 

1.  Use  a  digital  electronic  processor  to  compute  r  =  b  -  A.t. 

2.  Use  the  optical  processor  to  find  an  approximate  solution  #  to  the  system  Ae  =  r. 

3.  Use  the  digital  processor  to  refine  the  approximate  solution  of  Ax  =  b  to 
£*>£  +  *.  If  x  is  accurate  enough,  terminate  the  iterations;  otherwise,  set  x  =  x 
and  return  to  step  1 . 

Due  to  inaccuracies  in  writing  and  reading  the  signals  using  electro-optical  devices, 
the  optical  processor  solution  x  of  Ax  =  b  is  the  exact  solution  of  a  perturbed  system 
{A  +  £)x  »  b,  where  we  assume  that  the  matrix  A  +  £  is  nonsingular.  In  the  current 
state  of  the  art  in  optical  processing,  the  magnitudes  of  the  entries  of  the  error  matrix 
£  may  be  from  three  to  five  percent  of  the  maximum  magnitude  of  the  entries  of  A. 

•  Thii  work  «u  supported  by  the  Innovative  Science  and  Technology  Office  of  the  Strategic 
Defense  Initiative  Organization,  administered  through  the  Office  of  Naval  Research  under  contract 
N00014-86-K-0591. 
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In  numerical  experimentation  on  a  digital  electronic  computer,  the  error  matrices 
were  randomly  generated  [1],  [2].  In  extensive  experiments,  it  was  found  that  -4  -+•  £ 
would  be  nonsingular  and  that  the  approximate  solutions  converged  to  a  solution 
of  Ax  =  b.  Application  of  the  method  can  be  viewed  as  a  preconditioning  of  the 
system.  It  was  applied  to  systems  for  which  A  was  nonsingular  with  condition  number 
varying  from  small  to  quite  large  [l],  [4].  Later  consistent  systems  with  singular 
coefficient  matrices  were  solved,  and  it  was  found  that  convergence  to  a  solution  of 
Ax  =  b  was  in  practice  even  better  than  for  A  nonsingular  [2], 

When  the  method  is  applied,  a  sequence  x0,  x,.  x, - of  approximate  solutions 

to  the  consistent  system  4.x  =  b  are  generated  where 

(4  +  £).Xj  =  Ex,-!  +  b.  i'=1.2 . 

[t  is  well  known  (e.g.  see  [3])  that  for  nonsingular  A  +  £  iteration  converges  to  a 
solution  of  4.x  =  b  for  every  initial  vector  x0  if  and  only  if  lim  ((4  +  £)-  l£)m  exists. 

m—  r 

If  this  limit  exists,  for  n  x  n  complex  matrices  4  and  £.  then  we  say  that  £  is  an 
acceptable  error  matrix  for  A.  Let  H„  =  {h,t)  be  the  n  x  n  matrix  with  Au+,  =  I  for 
i  =  l,  2 . n  -  l,  and  all  other  entrjes  zero. 

T heorem  1  Let  A  and  G  ben  x  n  complex  matrices  with  G  and  A  +  G  nonsingular. 

(a)  If  zero  is  not  a  multiple  root  of  the  minimum  polynomial  of  G-14,  then  there 
exists  a  positive  real  number  p  such  that ,  for  all  complex  numbers  y  with 
0  <  |y|  <  p,  £  =  y G  is  an  acceptable  error  matrix  for  A. 

(b)  If  there  exists  a  nonzero  complex  number  y  such  that  £  =  yG  is  an  acceptable 
error  matrix  for  4,  then  zero  is  not  a  multiple  root  of  the  minimum  polynomial 
ofG~lA. 

Proof  Let  4  have  rank  r,  and  suppose  that  zero  is  not  a  multiple  root  of  the 
minimum  polynomial  of  G-lA.  Then  there  exist  a  nonsingular  n  x  n  matrix  P  and 
a  nonsingular  r  x  r  matrix  Q  such  that 

/}-1G*‘4£  =  r°  °  . 

LO  QJ 

Let 

p  =  (min{|x|:  a  is  an  eigenvalue  of  Q})/2. 

Then  p>0.  Let  •/  be  a  complex  number  with  0  <  |y|  <  p.  It  follows  that  yG  is 
nonsingular  and 

Let  A  be  an  eigenvalue  of  /  +  y-  lQ.  Then  a  =  1  +  pjy  for  some  eigenvalue  p  of  Q. 
Thus  |Aj  >  \p/y\  -  1  >  2  -  1  =  1.  Therefore,  l  +  y~lQ  is  nonsinguiar  and  |a|  <  1  for 
each  eigenvalue  a  of  (/  +  y~  lQ)~ l.  It  then  follows  that  (yG)' 1 A  +  /  is  nonsingular  and 
that  lim  (((yG)"l4  +  /)“ 1  )*"  exists.  Moreover,  since  (yG)- 14 +  /  =  (yG)' ‘(4  +  yG), 
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we  see  that  A  +  y G  is  also  nonsingular  with 

(A  +  yG)"  lyG  =  (f/Gr  ‘M  +  yG))' 1  «  <(yG)"  lA  +  /)' 1 

Thus  £  =  yG  is  an  acceptable  error  matrix  for  A.  Therefore  part  (a)  holds. 

Now  suppose  that  there  exists  a  nonzero  complex  number  y  such  that  £  =  yG  is 
an  acceptable  error  matrix  for  A.  and  that  zero  is  a  root  of  the  minimum  polynomial 
of  G~lA  of  multiplicity  k,  where  k>  1.  From  the  Jordan  canonical  form  of  G"‘4, 
we  see  that  there  exists  a  nonsinguiar  matrix  P  such  that 

p-6-MP.r*  “1 

Lo  Qi 


where  Q  is  some  (n  —  k)  x  (n  -  k)  matrix.  It  follows  that 


P'l(4  +  yG)"‘yGP  = 


”(/  +  y'lHk)~l 
0 


0 

(/+7“lerl. 


Since  lc>l,  we  see  that  lim  ((/  +  y"  lHk)~ ‘)"  does  not  exist,  and  thus 

m  -*  x 

lim  ((4  +  yG)"  ‘yG)m  does  not  exist.  This  contradiction  establishes  part  (b). 

m-  x 

If  A  is  given  and  £  is  randomly  generated,  one  would  expect  £  and  A  +  £  to  be 
nonsingular,  and  that  zero  would  not  be  a  multiple  root  of  the  minimum  polynomial 
of  £"‘4.  Therefore,  if  the  entries  of  £  are  chosen  with  magnitudes  fairly  small  in 
comparison  with  the  maximum  magnitude  of  the  entries  of  A,  Theorem  1  indicates 
that  £  will  probably  be  an  acceptable  error  matrix  for  A. 

Part  (a)  of  Theorem  1  clearly  implies  the  following. 


Corollary  2  Let  A  and  0  be  nonsingular  n  x  n  complex  matrices  for  which  .4  +  G 
is  nonsingular.  Then  there  exists  a  positive  real  number  p  such  that ,  for  all  complex 
numbers  y  with  0  <  jyj  <  p.  £  =  yG  is  an  acceptable  error  matrix  for  .4. 


For  nonzero  singular  matrices  A  we  have  the  following. 


Theorem  3  Let  A  be  an  n  x  n  complex  matrix  of  rank  r  where  0  <  r  <  n.  Then  there 
exists  a  nonsingular  n  x  n  complex  matrix  G  such  that  A  +  G  is  nonsingular  and  for 
each  nonzero  complex  number  y,  yG  is  not  an  acceptable  error  matrix  for  .4 . 


Proof  Let  d  =  n~r,  let  fli,n2 . nd  be  positive  integers  with  +  n:  +  +nd  -  n. 

and  define  a  block-diagonal  matrix  J  by  letting 


y  =  diag[W„l,^„l . tfj. 

Since  A  and  J  have  the  same  rank,  there  exist  nonsinguiar  matrices  P  and  Q  such 
that  QAP  =  J.  Let  G  =  (PQ)'1,  and  let  y  be  nonzero  complex  number,  it  follows 
that  A  +  y G  is  nonsinguiar  with 


(A  +  yG) "  ‘yG  =  ((yG) " 1  {A  +  yG))"  ‘  =  P(/  +  y  "  ‘7 ) "  1 P " 1 . 

Since  n,  >  1  for  some  i  =  1,2 . d,  we  see  that  lim  ((/  4-  y"  ‘H„,)'  l)"  does  not  exist, 

and  thus  £  =  yG  is  not  an  acceptable  error  matrix  for  A. 
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The  acceptable  error  matrices  that  we  have  presented  for  a  given  matrix  .4  would 
appear  to  have  entries  with  small  magnitudes  in  comparison  with  the  maximum 
magnitude  of  the  entries  of  A.  However,  the  next  two  theorems  show  that  each  n  x  n 
complex  matrix  4  has  acceptable  error  matrices  with  entnes  of  arbitrarily  large 
magnitude.  For  nonsingular  .-1.  it  is  easy  to  prove  the  following. 

Theorem  4  Let  A  be  a  nonsingular  complex  matrix.  If  y  is  any  complex  number  with 
positive  real  part .  t hen  E  =  yA  is  an  acceptable  error  matrix  for  A. 

Now  let  K„  =  (<cf>)  be  the  n  x  n  matrix  with  k„t  =  1  and  all  other  entnes  zero.  For 
nonzero  scalars  •/.  we  see  that  the  matrix  Hn  +  ;K„  is  nonsingular  with  inverse  equal 
to  the  transpose  of  H„  +  y‘  ‘/C„.  We  shall  use  this  in  proving  the  following. 

Theorem  5  Let  A  be  a  singular  n  x  n  complex  matrix  of  rank  r.  and  let  d  =  n  —  r. 

Then  there  exist  d  linearly  independent  n  x  n  complex  matrices  £,.  E, . Ed  such 

that  for  all  nonzero  complex  numbers  yt,  y2 . yd.  E  =  •/,£,  +  y2£2  +  •  •  ■  +  yd Ed  is 

an  acceptable  error  matrix  for  A. 

Proof  There  exist  positive  integers  n1 . nd  with  nt  +  n2  +  •  •  •  +  nd  ^  n  and 

nonsingular  matrices  P  and  Q  such  that 

P~  lAP  =  dizglH„r  //„, . H„d,  Q], 

For  1=1,2 . d.  let 

E,  =  P  diagiA,  Kmt.  SnKni . didK„  0 ]P'\ 

where  6tj  is  the  Kronecker  delta.  Clearly,  £,.  £2 . £d  are  linearly  independent. 

Let  y, ,  y2 . yd  be  nonzero  complex  numbers,  and  let  E  -  y^E,  +  y2E2  ■»■■■■+  ydEd. 

For  i=l,2 . d,  we  have 


1  0 
0  0 


IH.,  +  y.Kj-'y.K^  =  +  y-'KJy.K,,  = 

Therefore,  lim  ((//„,  +  y, KnJ~' y, K„t )m  exists  for  i  -  1.2 . d.  and  it  follows  that 


A  +  E  is  nonsingular  and  that  lim  (M  +  E)'lEr  exists.  Thus  £  is  an  acceptable 

m  -*  r 

error  matrix  for  A. 
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ABSTRACT 

A  novel  system  for  solving  systems  of  nonlinear  equations  is  proposed.  Two  different 
algorithms  are  introduced.  A  speed  analysis  of  the  two  different  algorithms  is  presented  and 
compared  with  the  speed  of  their  digital  computer  counter  parts.  A  great  advantage  in  meed  is 
shown  for  large  size  problems. 


Systems  of  nonlinear  equations  arise  in  the  process  of  solving  many  physical  problems.  They 
are  a  very  important  class  of  mathematical  problems.  Iterative  methods  are  used  to  solve  such 
problems. 

In  this  paper  we  propose  a  new  method  for  solving  this  class  of  nonlinear  problems  using 
optical  processors.  In  Section  2  the  iterative  methods  used  in  solving  nonlinear  systems  of 
equation  is  reviewed.  In  Section  3  the  optical  implementation  is  proposed  using  two  different 
algorithms.  The  speed  analysis  of  the  two  algorithms  is  given  in  Section  4.  In  Sec’ ion  5 
conclusions  and  final  remarks  are  drawn. 


2.  NEWTON's  METHOD 
Systems  of  linear  equations  are  given  as  follows 


.  * 

Ax  =  b 


where  A  is  an  n  *  n  matrix,  x  and  b  are  q  *  1  vectors.  In  these  systems  A  and  b  are  given  and  x 
the  solution  of  the  system  is  unknown. 

Nonlinear  systems  of  equations  can  be  represented  by 


?(*)  =  0 
or 
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where  fi's  axe  nonlinear  functions  of  x 

One  of  the  methods  used  in  solving  for  x  in  the  nonlinear  system  of  equations  is  Newton's 
method.  For  a  single  nonlinear  equation,  an  initial  solution,  Xo,  of  the  equation  is  assumed,  and 
the  (k+l)*!*  iteration  of  the  solution  is  given  by(  l) 

xk+l  =  xk  ~  (fy  1  fk  3) 


where 


fk=  ^xk^  -  ^  = 


For  a  system  of  nonlinear  equations,  Eq.  (3)  can  be  rewritten  as 

*k+l  =  *k  ”  (Jk^  1  \  ’ 


where 


(6) 


and  J  is  the  Jacobian  matrix. 

Let 

(Jkrl  ?k  =  «k . 

then 


Jk2k 


Eq.  (8)  is  a  system  of  linear  equations  to  be  solved  for  2k,  which  is  the  correction  needed  for  the 
(k+l)th  solution  iteration.  The  algorithm  for  solving  the  system  of  nonlinear  equations  will  be  as 
follows: 


i)  Assume  a  solution  *o. 

ii)  Compute  the  nxl  vector  ?k  and  the  nxn  matrix  Jk 

iii)  Solve  the  linear  system  of  equations  Jk2k  =  ?k  for  2k 

iv)  Compute  the  refined  solution  5tk*i  =  itk  —  2k 

v)  Check  if  the  norm  ||?k»i-  K||  <  «  stop,  otherwise  go  back  to  step  (ii).  t  is  the 
allowable  error. 


3.  OPTICAL  IMPLEMENTATION 

The  iterative  algorithm  introduced  in  Section  2  requires  0(n3)  number  of  operations  when 
used  with  conventional  digital  computer.  The  most  expensive  part  of  the  algorithm  is  step  diii 
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to  solve  a  system  of  linear 
equations.  In  previous 

publications*  2-0  we  have 
proposed  and  analyzed  a  hybrid 
optoelectronic  processor,  the 
Bimodal  Optical  Computer 
BOC,  capable  of  solving  linear 
systems  of  equations  accurately 
and  rapidly.  In  this  section  we 
modify  that  system  to  be  used 
to  solve  systems  of  nonlinear 
equations  as  shown  in  Fig.  1. 
We  propose  two  different 
algorithms,  the  First  utilizes  the 
use  of  the  analog  processor  to 
solve  the  system  of  equations 
(8)  approximately,  and  the 
second  to  use  the  BOC  to  solve 
the  system  of  equations  (8) 
exactly  (within  the  specified 
accuracy). 


j 


k 


Fig.l  Block  diagram  of  the  hybrid  optoelectronic  system. 


3.1  Hybrid"  Analog  Optical  Processor 

In  this  system  we  use  the  optical  analog  processor  to  solve  Eq.  (8)  approximately.  For  this 
system  we  introduce  the  following  algorithm: 

-4 

a)  Use  the  digital  processor  to  guess  an  initial  solution  x<>. 

b)  Use  the  digital  processor  to  compute  both  the  vector  fk  and  the  matrix  Jk. 

c)  Use  the  optical  analog  processor  to  solve  the  system  Jk  2k  =  fk  for  2k,  approximately, 

where  the  superscript  o's  denote  inaccuracies  in  optics  or  electronics. 

d)  Use  the  digital  processor  to  read  2k  and  compute  the  refined  solution  *k»i  =  -*k  -  2k. 

e)  Check  if  the  norm  ||?k*i  -  lk||  <  e  stop,  otherwise,  go  back  to  step  (b)  and  recycle. 

3.2  Hybrid  BOC  Processor 

In  this  system  the  BOC  is  used  to  solve  Eq.(8)  exactly.  For  this  system  we  introduce  the 
following  algorithm: 

a)  Use  the  digital  processor  to  guess  an  initial  solution  Xq. 

b)  Use  the  digital  processor  to  compute  both  f^  and  the  matrix  J^. 

4  4  * 

c)  Use  the  BOC  to  solve  the  system  Jk  Ck  =  fk,  exactly  for  Ck- 

d)  Use  the  digital  processor  to  read  2k  and  compute  the  refined  solution  ik+i  =  *k  ~  2k- 

e)  Check  if  the  norm  ||fk*i  -  Kll  <  « stop,  otherwise,  go  back  to  step  (b)  and  recycle. 
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The  following  speed  analysis  is  based  on  a  system  of  linear  equations  with  ze,  n. 

4.1  Digital  Processor 

The  total  time  required,  Tjyp  to  solve  the  system  of  nonlinear  equations  using  a 
conventional  digital  processor  is  given  by 


TDT  ~  Hr  +2n(Q+1)lTDlND  ' 


where 


TDi  =  the  time  needed  to  do  one  digital  operation  (e.g.,  a  multiplication), 


Nq  =  the  number  of  iterations  needed  for  the  solution  convergence. 

4.2  Hybrid  Analog  Optical  Processor 

The  total  time  required,  TQA,  to  solve  the  system  of  nonlinear  equations  using  the  processor 
introduced  in  Section  3.1  is  given  by 

Toa=  [n(n+2)Tm  +  TA1]  Na  ■.’■0). 

where 

Tax  =  the  time  required  for  the  optical  analog  processor  to  solve  the  system  of  linear 
equations  (8)  approximately, 


=  the  number  of  iterations  required  for  the  solution  convergence. 

4.3  Hybrid  BOC  Processor 

The  total  time  required,  TQB,  to  solve  the  system  of  nonlinear  equations  using  the  processor 
introduced  in  Section  3.2  is  given  by 


TOB  “  [2n(a+1)TDl  +  TAl)  !BND 


where 


Ig  =  the  number  of  iteration  needed  for  the  BOC  to  solve  Eq.  (8)  to  the  specified  accuracy. 
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4.4  Speed  Advantage 

It  is  of  great  interest  to  determine  what  is  the  break  even  point  for  the  optical  processor 
proposed  to  be  faster  than  the  digital  processors.  This  condition  is  defined  by  " '  " " 

TDT>TOA  f-2) 

and 

TDT>TOB  •  ,I3' 

From  Eqs.  (9)  to  (II)  the  conditions  (12)  and  (13)  can  be  written  as 


A  *  Af  >  1 
n  t 

for  the  hybrid  analog  processor,  where 

•a  =  na/% 

And  for  the  hybrid  BOC  processor 


or 


Where 


and 


-  3-2n(n+l)(IB-l) 


>1  , 


Ba  x  At  >  ! 


A  -  n2(n/3+l) 

An  ’ 

n3/3  -  2n(n+l)(IB-l) 

ri 


fU) 


(15) 


(16) 


(!’) 

(13) 


(19) 


(20) 


(21) 


The  number  of  iterations,  1^  and  IB,  usually  are  in  the  range  of  1  to  10 (  u  The  ratios,  AQ  and  Ba 

are  problem  dependent,  and  are  much  larger  than  1  for  large  values  of  n.  On  the  other  hand.  At 
depends  on  the  speed  of  the  analog  processor  for  solving  a  system  of  linear  equation,  which  can  be 
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in  the  rsuge  of  ^sec.  But  since  the  matrix  Jk  need  to  be  updated  every  cycle,  writing  the  matrix 
Jk  on  the  SLM  becomes  the  bottleneck  of  the  processor  speed.  With  todays  technology  writing "a 
matrix  on  an  SLM  may  take  a  few  milliseconds.  So  A.  is  much  less  than  1.  In  Fig.  2(a)  and  fb) 


the  Log  (Att)  and  Log  (Bb)  are  plotted  in  terms  of 
1,  for  n  2  10,  while  Aa  >  1  for  n  s  60  and  120,  for  I 


the  range  of  10"3,  we  can  have  a  speed  advantage 
50,  and  for  the  hybrid  BOC  processor  for  n  >  120. 


the  system  size,  n,  respectively.  The  ratio  A„  > 
B  =  10  and  20  respectively.  For  the  At  ratio  in 

for  the  hybrid  analog  optical  processor  for  n  > 


Fig-2  Plot  of  log  of  the  ratio  (a)  AQ  of  Eq.(19),  and 
(b)  Bq  of  Eq.(20),  in  terms  of  the  size  of  the  matrix  ,n. 

Again  this  ratio  At  depends  mainly  on  how  fast  we  can  write  a  matrix  on  the  SLM.  By  the 
introduction  of  faster  SLM's  the  speed  advantage  can  be  gained  for  smaller  values  of  n. 

5-  CONCLUSIONS 

Two  new  hybrid  opto  electronic  processors  are  introduced  for  solving  systems  of  nonlinear 
equations.  The  speed  of  the  two  processors  is  analyzed  and  compared  with  the  speed  of  digital 
processors.  It  is  shown  that  the  main  factor  of  the  speed  limitation  is  the  speed  the  SLM’s  used  to 
write  the  matrix  on.  With  the  existing  SLM's  a  speed  advantage  can  be  gained  for  n>100. 
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ABSTRACT 

Analog  optics  is  very  fast  but  not  very  accurate.  Digital  electronics  is  much  slower  but  much  more 
accurate.  Compromise  (hybrid)  systems  appear  to  be  intermediate  in  both  speed  and  accuracy.  As  there  are 
cases  in  which  analog  optics  is  too  inaccurate  and  digital  electronics  is  too  slow:  hybrid  processors  may 
have  an  important  role  to  play. 


I.  ERRORS  IN  ANALOG  OPTICS 

Despite  occasional  claims  in  the  literature,  making  a  multichannel  analog  optical  system  with  all  chan¬ 
nels  controllable  and  repeatable  to  1%  absolute  signal  accuracy  is  extremely  difficult.  Thus  if  analog 
(number  magnitude  proportional  to  light  irradiancel  encoding  is  used,  the  accuracy  with  which  numbers  can  be 
represented  is.  at  best.  1\  of  the  maximum  magnitude  number. 

Unfortunately,  l*  representation  accuracy  of  inputs  does  not  lead  to  1%  representation  accuracy  of  cal¬ 
culated  results.  Obviously,  the  exact  errors  can  not  he  predicted  (otherwise  they  would  hardly  count  as 
errors!).  What  we  can  predict  is  some  sort  of  average  or  expected  error. 

Rather  than  predict  errors  in  specific  components  of  a  vector  or  matrix,  we  seek  more  global  metrics. 
The  norm  function  II  II  is  convenient.  The  norm  of  the  vector 
-  T 

V  =  (Vj.  V2,  .  .  .  Vn) 
is  usually  defined  as 

'  V  -  [  |  V,|  N  .  I  V2  I  N 
Three  N  values  are  common. 

»  *  1  :  »~V  Ij  .  |  Vl|  -  |  V21 

N  -  2  :  *  V  »2  .  [  |  V,  |  2  ♦  |  V2  |  2  ♦  .  .  .  -  |  Vn  |  2  J1'2  . 

N  -  ■»  :  A  V  II, „  -  max  I  vn  I 

k- 1 . n 

Most  mathematicians  use  the  N  *  2  (Euclidean)  norm  and  drop  the  subscript,  e  g  . 

»  V  8  *  [  V,2  -  V22  -  .  .  .  -  Vn2  ]'  '2 

We  can  now  define  a  matrix  norm 

II  A  II  -  max  A  x  II 

II  x  II 
x 

Since  any  x  can  be  expanded  in  terms  of  the  eigenvectors  .  e2 .  en  of  A. 

we  have 

x  cj  ej  *  c2  e2  -  .  .  .  -  cn  en  . 

But 

a  x  =  c,  A,  et  -  c2  A2  e2  -  ■  cn  An  e„ 


1/N 


*  Vn 
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where 


A  ej  *  Aj  ej 
Clearly 

I  A  I  ■  aax  |  Xj  | 

1 

Equally  clear  Is  the  relationship 
I  A*I|  -  min  (1/1  Xj|  ) . 

A  convenient  aeasure  of  the  ability  of  a  aatrix  to  lead  to  accurate  results  is  the  condition  nuaber.  This  is 
soaetlaes  written  as  k(A) ,  x(A) ,  or  cond  (A).  For  no  particular  reason,  we  choose  x(A).  By  definition. 

X(A)  *  fl  A  B/fl  A  *11  *  I  A  |  max  /  |  A  j 

Roughly  speaking,  the  output  error  is  x(A)  tines  the  input  or  ''representational"  error  in  solving  linear 
equations,  inverting  aatrices.  solving  eigen  probleas,  etc. 

For  these  Introductory  purposes,  these  observations  are  sufficient.  With  representational  accuracy 
fixed  at  1*  or  less  and  requiring  1%  or  better  accuracy  in  our  results,  we  conclude  that  analog  accuracy  aay 
suffice  if  x(A)<l.  Since,  however 

X<A)  *  |  A  |  „ax  /  j  A  |  . 

we  have  x(A)  )  1.  In  fact,  we  often  have  x(A)  ))  1.  For  these  cases  analog  processors  are  hopelessly  inac¬ 
curate.  Accuracy  is  always  lost. 

II.  PROBLEMS  WITH  DIGITAL  OPTICS 

Digital  optics  appears  to  offer  a  possible  solution.  Each  nuaber  is  represented  by  nultiple  analog  chan¬ 
nels  in  tine  and/or  space.  If  aultiplicity  in  space  is  used  and  proper  nuaber  representation  is  employed, 
great  parallelise  and  essentially-analog  speed  is  accomplished  at  the  price  of  great  physical  complexity.  An 
additional  problem  in  formatting  and  deforaattlng  tends  to  slow  the  process  and  increase  the  power  consump¬ 
tion.  These  probleas  aay  not  be  insurmountable  but  they  are  certainly  difficult  enough  and  far  enough  away 
from  solution  to  motivate  the  search  for  alternative  (non  digital)  ways  of  making  accurate  optical  pro¬ 
cessors. 


III.  OBSERVATIONS  ON  COMPUTATIONAL  COMPLEXITY 

We  want  to  have  a  tool  for  addressing  the  question:  "How  difficult  is  this  calculation"?  The  now- 
traditional  measure  is  computational  complexity.  The  basic  idea  is  to  break  up  the  operations  into  their 
most  primitive  parts,  e.g.,  multiplies,  and  count  the  number  of  these  required.  Actually,  we  do  one  other 
important  calculation.  We  associate  a  nuaber  N  with  the  problem  size,  e.g.,  an  N  x  N  aatrix  has  size  N.  We 
then  ask  how  the  nuaber  of  calculations  scales  with  N.  Many  algorithms,  especially  in  linear  algebra,  have 
polynomial  complexity.  That  is  their  complexity  scales  as  roughly  Np.  written  0(NP)  and  often  said  "order  of 
Np. "  Note  that  it  is  the  algorithm  not  the  problem  that  has  a  complexity.  Matrix-matrix  multiplication  as 
we  all  learned  it  is  0(N3).  Minimal  complexity  algorithms  now  approach  0(N2-5).  This  difference  is  far  from 
subtle  for  large  N. 

Another  way  of  viewing  computational  complexity  is  as  a  minimum  price  to  be  paid  to  make  a  calculation. 
That  price  can  be  paid  in  spatial  complexity,  temporal  complexity  or  both.  We  will  be  aiming  at  high  speed 
and  thus  low  temporal  complexity.  To  do  this  we  will  use  a  Bimodal  Optical  Computer  (BOC)  which  does  high 
complexity  tasks  by  analog  optics  and  lower  (essentially  by  a  factor  of  N)  complexity  task  by  digital 
electronics . 

IV.  ILLUSTRATIVE  ALGORITHM 

We  suspect  all  linear  algebra  probleas  can  be  solved  by  BOC's.  We  will  discuss  the  simple  A  x  *  b  prob¬ 
lem  first.  Other  algorithms  for  other  probleas  are  shown  in  an  appendix. 

We  suppose  we  have  an  analog  optical  Ax  -  b  solver.  We  are  given,  to  digital  accuracy,  A  and  b.  We 
represent  them  in  our  optical  solver  as  best  we  can.  Our  solution  vector  can  be  called  x0.  To  check  whether 
x0  is  adequate  or  not  we  calculate  (digitally,  accepting  x0  as  fully  accurate)  a  residual 

ro  *  b  -  Ax0  . 
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This  i»  0(N2)  digitally.  If  |  r0  |  is  acceptably  small.  we  quit.  Otherwise  we  solve 
A  Ax0  *  r0 

optically.  Note  that 

A  (*0  +  Ax0)  *  Ax0  *  AAx0 
*  Ax0  ♦  r0 

-  Ax0  *■  (b  -  Ax0) 

-  b. 

Thua  x0  *  Ax0  is  the  desired  x.  Unfortunately,  our  analog  solution  for  Ax0  is  inaccurate.  Our  result  i3  not 
Ax0  but  6x0. 

We  fore 

x,  -  x0  ♦  5x0 
digitally  and  evaluate 
r,  *  b  -  Ax, 

digitally.  If  |  r,|  is  small  enough,  we  quit.  Otherwise  we  recycle. 

That  is  the  basic  algorithm.  It  requires  soae  aodif ications  for  use  with  optics.  It  also  requires  soae 
convergence  analysis.  After  all.  if  the  analog  solutions  are  inaccurate,  isn't  it  possible  that  the  solution 
will  get  worse  not  better? 


V.  CONVERGENCE 

It  is  trivial  to  show  that  if  we  can  guarantee  |  rj  <  |  r^.J  ,  then  convergence  must  occur.  In  Ref.  1,  we 
showed  that  this  leads  to  the  sufficient  condition 

x(A)  *  E  II  1 

X,A'  HI  2  . 

where  E  is  the  representation  error  matrix.  If  we  like,  we  can  rewrite  this  as 
«  E  1  <1  Xl«ax/2*<A)- 

To  first  order  it  seeas  more  profitable  to  assume  that  II  E  H/H  A  P  depends  more  on  the  computer  than  on  the 
matrix  A  and  can  be  replaced  by  a  universal  number  e,  which  we  will  call  the  computer  accuracy.  Then  conver¬ 
gence  occurs  if 

X  (A)  e  <  1/2. 

For  e  »  0.01,  our  hoped-for  1*  accurate  computer,  we  strongly  expect  convergence  for 
X  (A)  <  50. 

We  cannot  guarantee  convergence  because  it  is  the  actual  II  E  l/t  A  II  not  its  fictional  problem-independent 
average  that  counts.  Furthermore  there  is  no  reason  to  believe  that  convergence  night  not  occur  for  much 
higher  x  (A)  values.  We  would  expect  that  the  probability  of  convergence  is  strongly  related  to 

R  -  X(A)/xest 

where 

Xest  *  1/2  e 

Thus  we  night  expect  convergence  for  virtually  all  R-l  problems  and  a  auch  smaller  fraction  of  R«100 
problems.  Even  this  statement  hides  a  complexity.  Given  a  problem  and  a  computer,  each  particular  incident 
(attempt  to  represent  and  solve  the  problem)  leads  to  a  different  result.  This  can  even  be  a  strength 
if  (a)  we  can  afford  the  spatial  or  temporal  complexity  to  calculate  N  independent  answers  and  (b)  we  invoke 

the  central  limit  theorem  to  suggest  a  roughly  /n~ improvement  in  c. 

VI.  LORD  KELVIN'S  CONTRIBUTION 

Tha  basic  approach  of  using  a  fast,  low-accuracy  processor  in  conjunction  with  a  slow,  high-accuracy  com¬ 
puter  is  quite  old.  The  history  is  available  in  Ref.  1  and  references  therein.  Lord  Kelvin  (2)  made  a  vital 
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rlbution:  the  proposal  that  the  residual  r  be  scaled  to  utilize  the  dynamic  range  of  the  processor  well. 


A  U  ■  r 

ild  be  aultlplied  by  a  scaler  s  to  fora 
A  (sAx)  -  sr, 

■e  I  «r  I,  '  1  (the  aaxiaua  representable  number).  It  is  not  necessary  to  know  s  to  high  accuracy  because 
•ections  to  corrections  are  not  first  order  critical.  Since  r  is  calculated  digitally,  we  can  also  caicu- 
»  (to  low  accuracy) 

»  -  1/1  r  I 
00  ® 


the  saae  tine. 


VXI.  THE  Ax  -  b  SOLVER 

We  propose  to  use  the  0(1)  tiae  coaplexity,  tiae  continuous  Ax  *  b  solver  of  Cheng  and  Caulfield(3) .  The 
rt  of  the  processor  is  the  fully  parallel  Stanford  aatrix-vector  aultiplier.  Figure  1  shows  the  systea. 
ut  lights  representing  x  coaponents  are  spread  vertically  across  the  coluans  attenuating  aask  representing 
Row  subs  of  the  transaitted  light  are  detected  to  give  coaponents  of  the  output  vector  y.  For  all  k,  we 

ow 


4k  *  bk  ‘  yk 

ve  x^.  Here  y^  is  a  coaponent  of  the  calculated 
y  =  Ax. 


PLANAR 

WAVEGUIDES 


PHOTOOIOOE 

ARRAY 


Fig.  1  Systea  layout  of  the  Biaodal  Optical  Computer  (BOC) 

Cheng  and  Caulfield  showed  that  saooth  convergence  to  the  |  r  |  -  0  solution  occurs  at  a  rate  proportional 

o 

e-*t»in  t/to, 

(here  Amin  is  the  eigenvalue  of  A  with  ainiaua  |  \  \  and  t0  is  a  characteristic  tiae  (roughly  signal  round  trip 
>*e)  Obviously  convergence  requires 

*aln  >  0 

Vs  it  turns  out  (3),  this  is  a  sufficient  condition  for  convergence.  Calling  the  noraalized  tiae 
T  *  t/t0 
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and  noting 


*ain  *  *ain  *Bax/*aax 
■  xaax/x(A) , 

we  have  a  relaxation  rate 
e-**nxT/*(A) • 

Thus  aany  things  affect  the  relaxation  rate: 

to . 

the  inherent  processor  speed, 

*aax  • 

the  aaxiaua  eigenvalue,  and 
X(A). 

the  condition  nuaber  of  the  problea 

Thus  even  though  the  operation  is  teaporally  0(1).  the  convergence  speed  is  clearly  problea  dependen 
Easy  probleas  (low  x)  converge  rapidly.  Hard  probleas,  high  x.  converge  slowly. 

VIII.  SPEED  ADVANTAGE 

Let  us  coapare  a  BOC  Ax  -  b  solver  with  a  digital  iterative  Ax  -  b  solver. 

A  single  cycle  requires  one  hybrid  Ax  =  b  solver  cycle,  Tjj,  plus  teaporally  0(1)  A/D  and  D/A  operati 
plus  2N  (N*l )  digital  electronic  operations  of  duration  TD1 .  Taking  the  conversion  tlaes  into  TH,  we  ha 
total  tine 

TH"  !h  ItH1  *  2N  (N+l>  TD]] 
where  I)|  is  the  nuaber  of  required  iterations. 

The  iterative  digital  Ax  =  b  solver  requires  a  tine 
T0-  !D  [ (N*/3)  *  2N*  TDj  ]  . 

We  want 


Here 

k  -  Uh/Id)- 

Let  us  consider  the  two  factors  separately 
The  quantity 


Thi 
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problem  dependent  through  Tg .  For  ultra  fast  electronics  (nanoseconds)  and  small  loops  (t0  -  1 
nosecond),  the  A/0  and  0/A  converters  aay  be  the  speed  limiters.  For  lk  accuracy 

g-'Wn^o  -o.Ol. 
us 

Aaint  -  -  In  (0.01) 


?h  "*  -  (to^nln. )  ^  (0.01) 

.r  t0  -  T0, 

Q  ~  to/,^H  -  +  4-6*nin 

early 

Q  >  1.  for  all  Aajn  >.  .22. 
it  perhaps  not  much  less. 

The  0(N3),  Ap.  part  depends  strongly  on  k.  Obviously  k  >  1.  Hopefully  we  can  keep  k  -  1 .  Let  us  examine 
irlous  k  >  0  case  plotted  in  Fig.  2.  We  see  that  large  advantages  occur  for  low  k  (rapid  convergence)  and 
lrge  N.  That  is  we  win  for  large,  easy,  or  (preferably)  large  and  easy  problems.  If.  for  example.  Q  *  0.1. 
5  obtain  a  factor  of  10  advantage  for  all  k-N  products  above  the  horizontal  line  in  Fig.  2.  If  the  problem 
ize  is.  say,  200  and  k  -  1.  the  advantage  can  be  many  sany  orders  of  Magnitude. 


Fig.  ?  The  operation  advantage,  Ap 
IX.  SIMULATIONS 

We  can  add  Gaussian  stochastic  errors  to  the  "true"  numbers  to  simulatevarlous  accuracies.  Figure  3 
■hows  the  tH  as  a  function  of  x(A)  for^various  problems,  where  #r»/BxB<  10-*  is  required.  Note  I„  / 
•  since  tD  may  be  5  to  10.  Thus  1:10*  accuracy  is  a  low  k  situation.  Note  as  well,  that  convergence  tends 
-o  occur  even  for  R  >  1  (R  »  4  in  Fig.  3).  We  have  achieved  convergence  for  R's  as  high  as  60.  We  night 
<ant  to  relax  from  e»0.01  (Fig.  3)  to  much  less  trying  cases.  Figure  4  shows  that 

I„  <*  € , 

1  verV  benign  result. 


SPIE  Vol.  634  Optical  and  Hybrid  Computing  ( 1 9861  /  97 


Fig.  3  Computer  simulation  results  for  the  number  of  iterations  needed  for  con¬ 
vergence  of  the  solution,  plotted  vs.  the  condition  number. 


O  0.04  0.06  0.12  0.1  s  0.2  0.24  0.28 

STO  Errpr  In  Th«  Ma«k't  Matrix 


Fig.  4  The  number  of  Iterations  plotted  vs.  the  error's  standard  deviation  in 
the  matrix's  mask,  for  a  matrix  with  a  condition  number»150. 

X.  THE  FUTURE 

Besides  building  a  moderately  accurate  BOC  for  testing,  we  will  investigate  improvements.  Two  imp 
ments  are  suggested  below. 

First,  we  can  operate  on  A  by  "equilibration”  to  get  an  equivalent  matrix  A'  such  that 
X  (A1)  <  x  (A). 

That  is 

Ax  -  A  (DD-')x 
■  (AD)  (D-'x) 

a  t  1 

-A  x  . 

Here  D  is  a  diagonal  matrix.  For  equilibration  we  might  require  row  norm  equality.  In  our  early  exp* 
this  led  to  roughly  20*  improvement  in  condition  number. 

Second,  we  can  use  "convergence  factors"  to  try  to  force  or  improve  convergence. 
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rather  than 
xk  *  -  h*k-l' 

se 

xf  -  Xk-1  '  0  k-l  Axk-i 

lever  choice  of  the  8^'s  we  aay  be  able  to  iaprove  perforaance. 

APPENDIX 

In  this  appendix  we  present  algorithms  for  solving  the  inverse  of  a  matrix  and  the  eigenvalue  problems 
ig  BOC. 

The  Inverse  of  a  Matrix 

For  an  NxN  matrix  A  the  inverse  matrix  A—’  is  defined  to  satisfy  the  following  relationship 


AA-'  =1.  where  I  is  the  identity  matrix 
:h  can  be  rewritten  as 


(All 


A[— 1  ,  A2— 1 . A-']  =  I 

il  12  in 


(A2) 


re  A-’  is  the  J-th  column  vector  of  the  inverse  matrix  A->  So  eq.  (A2)  can  written  as  n  systems  of  lin- 
equations.  which  can  be  solved  individually  using  the  BOC  as  outlined  in  section  IV. 

Another  method  for  solving  the  inverse  matrix  problem  is  by  using  the  -IPan-Reif  method  (4).  If  the 

rix  B=A_1  then  define  the  error  matrix  E  as 


E  *  I  -  BA 

a-'  can  be  represented  in  terns  b  and  E  as 

A-1  -  (A-1B-1 )B  *  (I-E)-IB 


(A3) 


(AA) 


(1-x)  »  1  «  l*x*x‘*x>- 

1-x 


for  x  1. 


(A3) 


lilarly. 


A-'-  (1*  E  *  E*  *  E  ♦  E’  ♦ _ )B 


(A6) 


if  we  start  with  an  approximation  for  the  inverse  of  A  by  Bq.  then  the  error  matrix  E,  will  be  given  by 

(A7) 

Ei  -  I  -  BqA 

in  more  general  form  for  an  iteration  k 

^  V1  <-iA 


*•  ( I  ♦  E.  *  £«  + 


Bk-1 


(A7) 

(A8) 


-1 


n  and  Reif  introduced  a  simple  way  of  evaluating  B0  the  initial  approximation  of  A  Define  the  factor  t 


(max  I  A( I , J)  |  (max  1  ]  A( I . J)  j  ) 
I  J  I  J 


(A9) 


ich  is  the  product  of  the  maximum  magnitude  of  the  sum  of  the  rows  of  A  by  the  maximum  magnitude  of  the  sum 
the  columns  of  A,  Now  Bq  will  be  given  by 


B„ 


t  A 


.H 


iere  A  is  the  Hermltlan  transpose  of  A 


(A10) 
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We  now  introduce  an  iterative  method  for  solving  the  inverse  problem.  Eq.  (A9)  can  be  written  as  for  the 
case  of  snail  error  aatrix 

Bk  =  (I  ♦  Ek  )  Bk_j  .  k-1. 2 _  (A12) 

Now  let  us  outline  the  iterative  aethod  for  finding  the  inverse  of  A: 

i)  find  B0  ■  t  A* 
ii)  Ek  »  I  -  Bk_1A.  k-1, 2,... 

iil)  By  »  (I  *  Ek)  B,^-  k-1 ,2,  .  . 

iv)  Cycle  iteratively  through  steps  ii)  and  iii)  until  all  elenents  of  Ek  are  within  the  required  accuracy 
then  terminate  the  proces  and  Bk  is  equal  to  the  inverse  of  A.  Doing  this  method  using  a  digital  computer 
requires  a  long  tine  of  operations  and  a  large  memory  space  for  the  matrix  multiplication  We  can  do  this 
matrix  multiplication  using  the  BOC  in  mu  less  time  with  the  same  accuracies. 

2 •  The  Eigenvalue  Problem 

Determining  the  eigenvalues  and  their  corresponding  eigenvectors  is  a  very  fundamental  and  important 
problem  in  linear  algebra,  one  of  the  most  powerful  method  for  determining  the  eigenvalues  and  eigenvectors 
is  the  inverse  iteration  method  (5).  For  an  NxN  aatrix  A  tne  eigenvalues  Aj  and  their  corresponding 

eigenvectors  Xj  are  defined  by  the  equation 


Let  A  has  an 

AXj  -  Aixi 

n  destinctive  eigen  values  such  that 

( A 1 3 ) 

Assume  q  >  Aj 

1  A  j  |  >  |  A2  |  > .  1  An  |  . 

then 

(A14) 

(a  -  ql)  5<P+1>  -  l  <P» 

(A15) 

where 

~(P-1)  .  y(P  *  1)  /  o  y  'p-1*  »  „ 

(A16) 

—  ( P )  -  -  (p) 

y  -Xj,  and  1/1  y  1  ,  ■  A.  -q  as  p  -  «> 

CA17) 

30  by  assuming  a  value  for  the  vector  then  solving  the  system  of  equations 

given  In  Eq.  (A15)  we  get  y  (1)  and  from  eq.  (al6)  compute  tll)  and  we  keep  iterating  until  the  vector  yP 
become  stable  th-.n  we  terminate  the  iterations.  This  determines  both  the  eigenvalue  and  the  eigenvector 
very  accurately. 

The  initial  value  of  the  eigenvalue  q  can  be  tetermined  using  Gerchgorin's  theorem  (6). 
for  an  nxn  aatrix  A.  let  us  define  the  radius  as 

k 


rk  *  akj  I  (A18) 

J-* 

where  a.  as  the  k,j  coefficient  of  the  matrix  A.  rk  is  a  radius  of  disk  Dk  centered  at  a.w  within  which  an 
eigenvaide  will  lie  KK 

Dk  *  (A  |  |  A-akk  j  rk)  <  k  >  1,2,3 . N  (A19) 

Then  each  eigenvalue  of  A  must  lie  within  the  union  s  of  these  disks 

N 


s  »Uok 


(A20) 


K-1 

So.  after  we  determine  the  initial  eigenvalues  for  the  matrix  A  we  can  use  the  inverse  iteration  method  to 
find  the  more  refined  values  for  the  eigenvalues  and  their  corresponding  eigenvectors.  In  this  process  we 


94  /  SPIC  Vol.  634  Optical  and  Hybrid  Computing  (1986) 


will  use  the  BOC  to  solve  the  set  of  equations  (al5)  within  a  reasonable  accuracy,  this  method  will  have  the 
speed  advantage  over  the  all-digital  processor  because  again  we  reduced  the  problea  to  a  set  of  linear  equa¬ 
tion  solution. 
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ABSTRACT 

The  use  of  the  Bimodal  Optical  Computer  (BOC)  in  determining  the  weights  for  an 
adaptive  phased  array  radar  is  introduced.  Interference  canceling  is  presented  for  two  cases: 
first  assuming  the  direction  of  the  jammer  is  known,  secondly  no  a  priori  information  is 
assumed.  Effect  of  the  jammers  on  the  array  pattern  is  shown  for  up  to  four  jammers. 

1.  INTRODUCTION 

The  sensitivity  of  a  signal-receiving  antenna  array  system  to  interfering  noise  sources  can 
be  reduced  by  suitable  processing  of  the  outputs  of  the  individual  array  elements.  The 
processing  of  the  output  of  the  array  system  acts  as  an  adaptive  filtering  system  l'4.  The 
adaptive  phased  array  radar  systems  provide  the  means  of  suppressing  unwanted  interference 
signals.  This  is  achieved  by  nulling  the  array  pattern  at  the  direction  of  the  jammers.  Many 
algorithms  have  been  introduced  for  the  adaptation  process  and  these  are  reviewed  by 
Monzingo  and  Miller*. 

In  this  paper  we  present  a  new  technique  to  determine  the  weights  for  the  adaptive  array 
using  the  bimodal  optical  computer  (BOC)5'7.  The  bimodal  optical  computer  is  capable  of 
solving  systems  of  linear  equation  very  rapidly  with  high  accuracy.  In  the  adaptation  process 
we  reduce  the  optimization  problem  to  a  system  of  linear  equations,  which  in  turn  is  solved 
using  the  BOC. 

In  Section  2  we  review  the  basic  theory  of  adaptive  phased  array  radars.  The  bimodal 
optical  computer  algorithm  for  solving  the  optimization  problem  is  presented  in  Section  3. 
Computer  simulation  results  are  given  in  Section  4.  Conclusions  and  final  remarks  are  given  in 
Section  5. 


2«.  ADAPTIVE  PHASED  ARRAYS 

In  adaptive  phased  array  radars  the  incoming  signal  is  detected  by  an  array  of  sensors.  The 
detected  signal  is  a  combination  of  the  target  signal  plus  interference  and  noise  signals.  The 
system  is  adjusted  in  such  a  way  to  suppress  the  interference  signals  reception  without 
affecting  the  desired  signal. 

In  this  section  we  consider  the  two  general  cases  of  interference  canceling:  first  by  assuming 
that  the  interference  signal  direction  is  Known;  secondly  by  assuming  no  a  priori  information  is 
known  about  the  interference  signal. 

Interference  Signal  Direction  is  Known 


When  the  interference  signal  direction  is  known  the  weights  wj's  of  the  array  can  be  chosen 


SPV  Vo).  886  Optotltetrome  Signol  Proctssing  for  Phosod-Arroy  Anton  rot  (1988)  /  1 71 


to  suppress  the  interference  signal.  Let  the  system  shown  in  Fig.  1  be  used  to  demonstrate  this 
adaptation  technique. 


Fig.  1  Array  configuration 
for  interference 
canceling 


The  output  signal  of  the  array  s(t)  is  given  by  1 

s(t)  =  P[(wj  +  w3)  sin  utQ t  +  (w2+w4)  sin(w0t  -  0-f)] 

+  I[wj  sin(w0t  -  8)  +  w2  sin(w0t  -  0-J) 

+  w3  sin(w0t  +  0)  +  w4  sin(a;0t  +  0  -  £)]  ,  ( 1 ) 

where 

P  =8  the  pilot  signal, 

I  =  the  interference  signal,  and 
0  —  the  phase  shift 

0=*^sin*  (2) 

To  cancel  the  interference  signal  and  to  make  the  signal  s(t)  equal  to  the  pilot  signal,  we 
need  to  solve  the  following  system  of  linear  equations  for  the  weight  w.'s: 

wl  +  w3  =  1 

w2  +  w4  =  0 

(wj  +  w3 )  Coe0  —  (w2  -  w4)  sin0  =  0 
(w2  +  w4)  Coe0  +  (wj  -  w3)  sin0  =  0  . 

The  size  of  this  system  of  linear  equations  depends  on  the  number  of  sensors  in  the  array. 
The  number  of  jammers  can  make  the  system  under  or  overdetermined,  which  are  both  time 
consuming  algebra  problems. 
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This  is  the  most  general  case  where  we  assume  no  information  about  jammers.  The  system 
used  in  this  case  is  shown  in  Fig.  2.  v  ,  * 


Fig.  2  Basic  adaptive 
array  structure 
with  known  desired 
signal 


ln>  •(»  • 


Ea^e“  ^en.sors.  receives  a  signal  Xi(t)  which  is  in  turn  multiplied  by  a  variable  weight 

signal  tartar  ^  c0J^pare^  witftl  l^e  desired  signal  d(t),  their  difference,  the  error 

signal  c(t),  is  used  to  determine  the  value  of  w^s.  The  output  of  the  array  is 

n 

s(‘)  =  X  *lW  wi  (4) 

i=l 


where 


s(t)  =  #T  * 


’  W1  ]  xl(‘) 

*  =  *i  an d*  =  *iW  . 

*n  V‘) 

(6) 

For  digital  sampled  data 

»(j)  =  *T*0),  (T) 

and 

<(j)-d(j)-s(j)  =  d(j)-*T  5t(j)  .  (8) 

The  optimum  value  of  the  weights,  w/s,  is  the  one  reduces  e(j)  to  zero  or  at  least  minimize  it. 

equations: SamPleS  °f  diU  the  optimum  weiShts  satisfy  the  following  set  of  systems  of  linear 
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»T*(1)  =  d(l)  ' 

#ri(i)='d(i)  (9) 

wr*(N)  =  d(N)  .. 

The  N  sets  of  equations  have  n  unknowns,  and  usually  N  >>  n,  and  are  inconsistent  and 
over  specified.  The  optimization  problem  can  be  rewritten  as 


where 


*opt  “  Rxx  f  xd 


xlxl 


=  E{**l}  =  E{\ 


xlxn 

¥n 


xnxn 


f^-E^d}.  (12) 

The  matrix  is  called  the  covariance  matrix,  where  E{*}  is  the  ensemble  average. 

Many  algorithms  are  introduced2  to  solve  for  the  weights  in  Eq.  (10).  Some  of  the  popular 
algorithms  are  the  least  mean  square  (LMS),  and  the  direct  matrix  inversion  (DMI). 

We'll  briefly  mention  the  DMI  algorithm  since  it  leads  to  the  algorithm  introduced  in  this 
paper.  Eq.  (10)  cannot  be  determined  exactly  using  a  limited  number  of  samples  of  the  input 
data.  For  practical  consideration  a  small  number  of  samples  is  detected  to  be  used  in 

determining  w.  The  estimated  value  of  Eq.(10)  can  be  given  by 


*  =  «i'xd 


where 


Rx*  is  the  sample  covariance  matrix,  and  fxd  is  the  sample  cross-correlation  vector,  and  are 
given  by 
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and 


K 


'xd  =  K  I  *«>“«>  • 

j=l 


(15) 


and  K  is  the  number  of  samples.  The  DMI  algorithm  determines  the  inverse  of  the  sample 
covariance  matrix  R^,  then  from  Eq.  (13)  evaluates  vfr. 


3,  THE  BIMODAL  OPTICAL  COMPUTER  ALGORITHM 

Using  either  the  LMS  or  the  DMI  algorithms,  depends  in  its  convergence  on  a  number  of 

factors,  the  most  important  one  of  them  is  the  condition  number  of  the  matrix  It**.  If  the 

matrix  Rxx  is  ill-conditioned  or  singular,  it  either  converges  very  slowly  or  the  inverse  does  not 
exist,  respectively.  In  such  cases  other  methods  might  be  used,  but  they  are  lengthy  and  time 
consuming,  so  they  are  not  suitable  for  a  system  where  the  time  is  a  very  crucial  element. 

We  have  shown  in  previous  publications8*7  that  the  bimodal  optical  computer  is  capable  of 
solving  such  problems,  where  the  system  of  equations  is  ill-conditioned,  singular,  overspecified 
or  underspecified.  The  BOC  is  a  hybrid  system  in  nature,  Fig.  3.  It  uses  analog  optics  to  solve 
the  problem  approximately  but  rapidly 
and  it  utilizes  the  digital  electronics  to 
refine  the  solution,  in  an  interative 
scheme. 


The  optimization  problem  for  the 

weights  $  introduced  in  Section  2,  can 
be  rewritten  in  the  following  form, 
from  Eq.  (13) 

**xx*  =  *xd  ’  (16) 

Eq.  (16)  is  a  system  of  linear  equations 
can  be  solved  using  the  bimodal 
optical  computer.  Among  the 
advantages  of  using  the  BOC  over  the 
conventional  techniques  are:  Speed, 
especially  for  large  size  arrays, 
convergence  of  the  solution  for  difficult 
problems,  ill-conditioned  singular 
systems, which  is  the  case  of  most  of 
the  adaptive  array  radar  problems. 


A 


Fig.  3  The  Bimodal  Optical  Computer 


In  the  following  section  we  present  some  of  the  preliminary  results  from  computer 
simulation  studies  of  the  BOC  in  processing  adaptive  array  problems. 
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4.  SIMULATION  RF.SITT.TS 

Two  simulation  experiments  are  presented  in  this  section.  In  the  first  experiment  we  used 
a  five  element  array,  and  assumed  the  directions  of  the  jammers  axe  known.  In  the  second 
experiment  a  2  element  array  is  used  and  no  a  priori  information  is  assumed. 

In  Fig.  4  the  array  pattern  of  the  5  element  array  is  plotted  as  a  function  of  the  angle,  ip. 
Fig.  4(a)  shows  the  array  pattern  for  the  adaptation.  In  Fig  4(b)  a  jammer  at  45o  was 
considered,  the  pattern  after  adaptation  is  shown,  the  jammerknown.  The  array  pattern  after 
adaptation  has  reformed  in  such  a  way  to  null  the  jammer  signal.  In  Fig.  4(c)  four  jammers 
are  considered  at  45°,  80°,  120°  and  150°,  the  array  patis  again  reformed  to  null  all  the 
jammers  signals  reception. 


Fig.  4  Phased  array  pattern  for  5  elements,  (a)  before  adaptation,  (6)  adapted  pattern 
for  a  jammer  at  45o  ,  and  (c)  adapted  pattern  for  four  jammers  at  45o,  80o,  I20o, 
and  150o. 
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In  Fig.  5  the  BOC  was  used  to  solve  the  adaptation  problem  assuming  no  a  priori 
information  about  the  interference  signals.  Fig.  5(a)  shows  the  two-element  array  pattern 
before  adaptation.  In  Fig.  5(b)  to  (d)  the  pattern  is  plotted  for  a  single  jammer  placed  at  30°, 
45°  and  60°,  respectively.  In  all  these  plots  the  array  adapted  to  cancel  the  interference  signal 
in  each  of  the  given  cases.  In  all  of  the  above  results  the  jammer  signals  is  considered  to  be  of 
the  same  strength  as  the  desired  signal,  and  the  convergence  of  the  solution  obtained  in  less 
than  five  iterations.  Also  the  condition  number  of  the  Rxx  is  between  106  and  «. 


Fig.  5  Two-element  phased  array  pattern,  (a)  before  adaptation,  (b)  to  (d)  adapted 
patterns  for  single  jammers  at  30o,  45o  and  60o,  respectively. 


1-CONCLUSIQNS 

The  bimodal  optical  computer  is  shown  in  these  preliminary  results  to  present  a  powerful 
mean  in  solving  adaptive  phased  array  problems.  We  are  considering  in  the  future  work  larger 
array  sizes,  receiver  noise,  and  very  strong  interference  signals. 
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ABSTRACT 

The  use  of  the  bimodal  optical  computer  (BOC)  in  determining  the 
weights  for  an  adaptive  phased  array  radar  is  introduced  Interference 
canceling  is  presented  for  two  cases:  (1)  assuming  the  direction  of  the 
jammer  is  known  and  (2)  assuming  no  a  prion  information.  The  effect  of 
the  jammers  on  the  array  pattern  is  shown  for  up  to  four  jammers. 

1.  INTRODUCTION 

The  sensitivity  of  a  signal-receiving  antenna  array  system  to 
interfering  noise  sources  can  be  reduced  by  suitable  processing 
of  the  outputs  of  the  individual  array  elements.  The  processing 
of  the  output  of  the' array  system  acts  as  an  adaptive  filtering 
system  [1-4].  The. adaptive' phase  array  radar  systems  provide 
the  means  of  suppressing  unwanted  interference  signals.  This 
is  achieved  by  nulling  the  array  pattern  in  the  direction  of 
the  jammers.  Many  algorithms  have  been  introduced  for  the 
adaptation  process  and  they  are  reviewed  by  Monzingo  and 
Miller  [2], 

In  this  paper  we  present  a  new  technique  for  determining 
the  weights  for  the  adaptive  array  using  the  bimodal  optical 
computer  (BOC)  (5-7J.  The  bimodal  optical  computer  is  capa¬ 
ble  of  solving  systems  of  linear  equation  very_rapidly  with 
high  accuracy.  In  the  adaptation  process  we  reduce  the  prob¬ 
lem  to  a  system  of  linear  equations,  which  in  turn  is  solved 
using  the  BOC. 

In  Section  2  we  review  the  basic  theory  of  adaptive  phased 
array  radars.  The  bimodal  optical  computer  algorithm  for 
solving  the  adaptation  problem  is  presented  in  Section  3. 
Computer  simulation  results  are  given  in  Section  4.  Conclu¬ 
sions  and  final  remarks  are  given  in  Section  5. 

2.  ADAPTIVE  PHASED  ARRAYS 

In  adaptive  phased  array  radars  the  incoming  signal  is  de¬ 
tected  by  an  array  of  sensors.  The  detected  signal  is  a  combi¬ 
nation  of  the  target  signal  plus  interference  and  noise  signals. 
The  system  is  adjusted  in  such  a  way  to  suppress  the  inter¬ 
ference  signal  reception  without  affecting  the  desired  signal. 

In  this  section  we  consider  the  two  general  cases  of  inter¬ 
ference  canceling:  (1)  by  assuming  that  the  interference  signal 
direction  is  known  and  (2)  by  assuming  no  a  priori  informa¬ 
tion  is  known  about  the  interference  signal. 

2.1.  Interference  Signal  Direction  is  Known.  When  the  inter¬ 
ference  signal  direction  is  known  the  weights  w,  of  the  array 
can  be  chosen  to  suppress  the  interference  signal.  Let  the 
system  shown  in  Figure  1(a)  be  used  to  demonstrate  this 
adaptation  technique.  The  output  signal  of  the  arrray  s(t)  is 
given  by  (1| 

s(t)  -  P[(  w,  +  w,)sin<v  +  ( wz  +  w4)sin(  -  9  -  \ir )] 
+ 1  [  w,  sin( <. -  6)  +  w-  sin(  wr,t  -  9  -  \tt ) 

+  w,  sin(  Wnt  +  9)  +  Wj  sin(  ut,t  +  -  )»)] .  (1) 


where 

P  "  the  pilot  signal, 

I  ~  the  interference  signal, 

9  -  the  phase  shift,  (2) 

2  trd 

9  ~  — —  sin  4/ . 

A 

To  cancel  the  interference  signal  and  to  make  the  signal 
s(t)  equal  to  the  pilot  signal,  we  need  to  solve  the  following 
system  of  linear  equations  for  the  weights  w\ : 

+  Wy  -  1, 

W-  +  “  0. 

(  w,  +  w,)cos0  -  ( w-  -  w4)sin0  -  0.  ^ 

(  w,  +  w4 )cos  9  +  (  -  w,)sin0  -  0. 

The  size  of  this  system  of  linear  equations  depends  on  the 
number  of  sensors  in  the  array.  The  number  of  jammers  can 
make  the  system  under  or  overdetermined,  both  of  which  are 
time  consuming  algebra  problems. 

2.2.  No  A  Prion  Information  is  Known.  This  is  the  most 
general  case  where  we  assume  no  information  about  jammers. 
The  system  used  in  this  case  is  shown  in  Figure  1(b).  Each  of 
the  n  sensors  receives  a  signal  x,(t)  that  is  in  turn  multiplied 
by  a  variable  weight  wr  The  output  signal  s(  r )  is  compared 
with  the  desired  signal  d(t)  and  their  difference,  the  error 
signal  e(  r).  is  used  to  determine  the  value  of  wr  The  output  of 
the  array  is 

n 

5(0  -  £  x,(t)w,  (4) 

f-l 

or 

s(  t)  *  wrx,  (  5) 

where 

*,(  0 

MO  >  (6) 

MO. 

For  digital  sampled  data 

s(  j)  ”  w r\(j)  (7) 

and 

<(y)  *  d(j)  ~  r(  j)  m  <H  j)  -  »rx(  j).  (8) 

The  optimum  value  of  the  weights  w,  is  the  one  that  reduces 
r (j)  to  zero  or  at  least  minimizes  it. 

For  iV  samples  of  data  the  optimum  weights  satisfy  the 
following  set  of  systems  of  linear  equations: 

wrx(  1)  -  Ml) 

(9) 

wrx(0  —*/(*) 

wr\(  ,V)  -  d(  V), 
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Figure  1  Basic  adaptive  array  system  with  (a)  signal  and  noise  directions  known  and  (b)  no  a  priori  information  assumed 


The  N  sets  of  equations  have  n  unknowns,  and  usually 
/V  »  n,  and  are  inconsistent  and  overspecified.  The  optimiza¬ 
tion  problem  can  be  rewritten  as 

%p. 0°) 

where 

R„-£{xxr}  (11) 

and 

r  rJ-E{xd).  (12) 

The  matrix  R , ,  is  called  the  covariance  matrix,  where  E  ( ■ }  is 
the  ensemble  average. 

Many  algorithms  are  introduced  (2)  to  solve  for  the  weights 
in  Eq.  (10).  Some  of  the  popular  algorithms  are  the  least  mean 
square  (LMS)  and  the  direct  matrix  inversion  (DMI). 

We  wall  briefly  mention  the  DMI  algorithm  since  it  leads  to 
algorithm  introduced  in  this  paper.  Equation  (10)  cannot 
be  determined  exactly  using  a  limited  number  of  samples  of 


the  input  data.  For  practical  consideration  a  small  number  of 
samples  is  detected  to  be  used  in  determining  w.  The  esti¬ 
mated  value  of  Eq.  (10)  can  be  given  by 

(13) 

where  Rxx  is  the  sample  covariance  matrix  and  rxJ  is  the 
sample  cross-correlation  vector  that  are  given  by 

1  x 

-  7  L  x(y)xr(y)  (14) 

A  j  - 1 

and 

1  £ 

Kj  -  7  L  *0)  j)-  (15) 

K  is  the  number  of  samples.  The  DMI  algorithm  determines 
the  inverse  of  the  sample  covariance  matrix  R,,  and  then 
from  Eq.  (13)  evaluates  w. 
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Figure  2  The  bimodaJ  optical  computer  used  in  solving  a  system  of  linear  equations. 


3.  THE  BIMODAL  OPTICAL  COMPUTER  ALGORITHM 

Convergence  of  either  the  LMS  or  the  DMI  algorithms  de¬ 
pends  on  a  number  of  factors,-  the  most  important  being  the 
condition  number  of  the  matrix  Rxx.  If  the  matrix  Rxx  is 
ill-conditioned  or  singular,  it  either  converges  very  slowly  Or 
the  inverse  does  not  exist,  respectively.  In  such  cases  other 
methods  might  be  used,  but  they  are  lengthly  and  time  con¬ 
suming,  so  they  are  not  suitable  for  a  system  where  time  is  a 
very  crucial  element. 

We  have  shown  in  previous  publications  [fn  7]  that  the 
bimodal  optical  computer  is  capable  of  solving  such  problems, 
where  the  system  of  equations  is  ill-conditioned,  singular, 
overspecified,  or  underspecified.  The  BOC  is  a  hybrid  system 
by  nature;  see  Figure  2.  It  uses  analog  optics  to  solve  the 
problem  approximately  but  rapidly  and  it  utilizes  the  digital 
electronics  to  refine  the  solution,  in  an  iterative  scheme. 

The  adaptation  problem  for  the  weights  w  introduced  in 
Section  2.  can  be  rewritten  in  the  following  form,  from  Eq. 
(13): 

(16) 

which  can  be  written  as 

^x-b,  (17) 

where 


x  »  w, 

b-f,u 


Equation  (16)  is  a  system  of  linear  equations  that  can  be 
solved  using  the  bimodal  optical  computer.  Among  the  ad¬ 
vantages  of  using  the  BOC  over  the  conventional  techniques 
are  speed,  (especially  for  large  size  arrays),  convergence  of  the 
solution  for  difficult  problems,  and  ill-conditioned  singular 
systems,  which  is  the  case  for  most  of  the  adaptive  uray  radar 
problems. 


We  review  here  the  BOC  algorithm  in  solving  the  system 
Ax  ~  b. 

(a)  Solve  Ax  -  b  using  the  analog  optical  processor  to  get 

x0. 

(b)  With  a  dedicated  digital  electronics  processor,  read  x,, 

and  evaluate  the  residue 

r„  «  b  -  Ax0  -  Ax  -  .4x0  -  .-!( Ax,,).  (18) 

(c)  Normalize  r„  to  use  the  dynamic  range  of  the  system. 

(d)  Solve  optically  the  system 

Az  =  rr0  (19) 

where 

z  -  s{  Ax„ ) .  (20) 

and  s  is  the  radix  used  in  normalizing  r. 
e)  Evaluate  electronically 

x,  -  x0  -  Ax,,.  (21) 

and 

hr,  -  b  -  -lx,.  (22) 

(f)  If  |r,|  is  small  enough,  stop  Otherwise,  go  to  (c)  and 
recycle. 

In  the  following  section  we  present  some  of  the  preliminary 
results  from  computer  simulation  studies  ot  the  BOC  in 
processing  adaptive  array  problems. 

4.  SIMULATION  RESULTS 

Two  simulation  expenments  are  presented  in  this  section  In 
the  first  experiment  we  used  a  five  element  array  and  assume 
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Figure  3  Phased  array  pattern  for  five  elements  (a)  before  adaptation,  (b)  adapted  for  a  jammer  at  4S°,  and  (c)  adapted  for  four  jammers  at  45°. 
80°.  120°.  and  150°. 
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Figure  4  Two  element  phased  array  pattern  (a)  before  adaptation  and  (b>— (d>  adapted  for  single  jammers  at  30°.  45°.  and  60°.  respectively 


MICROWAVE  AND  OPTICAL  TECHNOLOGY  LETTERS  /  Vol  1  No  7  September  1988  239 


I 

i 

I 

i 

i 

t 

§ 


the  directions  of  the  jammers  were  known.  In  the  second 
experiment  a  two  element  array  is  used  and  no  a  prion 
information  is  assumed. 

In  Figure  3  the  five  element  array  pattern  is  plotted  as  a 
function  of  the  angle  4>.  Figure  3(a)  shows  the  array  pattern 
before  adaptation.  In  Figure  3(b)  the  pattern  after  adaptation 
is  shown  for  a  jammer  at  45°.  The  array  pattern  after  adapta¬ 
tion  has  reformed  in  such  a  way  that  it  nulls  the  jammer 
signal.  In  Figure  3(c)  four  jammers  are  considered  at  45°,  80°, 
120°,  and  150°.  The  array  pattern  is  again  reformed  to  null  all 
the  jammers  signal  reception. 

In  Figure  4  the  BOC  was  used  to  solve  the  adaptation 
problem  assuming  no  a  priori  information  about  the  inter¬ 
ference  signals.  Figure  4(a)  shows  the  two-element  array  pat¬ 
tern  before  adaptation.  In  Figure  4(b)-(d)  the  pattern  is 
plotted  for  a  single  jammer  placed  at  30°,  45°.  and  60°. 
respectively.  In  all  these  plots  the  array  adapted  to  cancel  the 
interference  signal  in  each  of  the  given  cases.  In  all  of  the 
preceding  results  the  jammer  signals  are  considered  to  be  of 
the  same  strength  as  the  desired  signal,  and  the  convergence  of 
the  solution  obtained  in  less  than  five  iterations.  Also  the 
condition  numbered  of  the  R‘x  is  between  106  and  oo. 


S.  CONCLUSIONS 

The  bimodal  optical  computer  is  shown  in  these  preliminary 
results  to  present  a  powerful  mean  for  solving  adaptive  phased 
array  problems.  We  are  considering  in  future  work  larger 
array  sizes,  receiver  noise,  and  very  strong  interference  signals. 
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Abstract 

An  optical-hybrid  matrix  processor  is  presented  and  compared  in  its  speed  with  a  digital  electronic 
processor.  Optical-hybrid  matrix  processors  are  shewn  to  be  far  more  superior  in  their  speed  in  solving 
systems  of  linear  equations.  This  advantage  in  speed  increases  with  the  increase  of  the  matrix  size.  The 
preplan  of  the  convergence  of  the  solution  using  the  optical-hybrid  is  investigated.  It  is  found  that  even 
with  using  elctxo-optical  systems  with  an  error  as  high  as  5%  in  the  I/O  devices,  convergence  was  achieved  for 
matrices  with  condition  nutters  as  high  as  150.  Sane  means  of  improving  the  condition  nunber  of  a  matrix  ore 
also  intoduced. 


I .  Introduction 


Analog  optics  is  very  attractive  for  signal  processing  and  oaiputinrg  because  of  its  ability  to  process 
two-dimens tional  data  in  parallel  very  rapidly.  Unfortunately,  this  high  speed  parallel  processing  achieves 
only  low  accuracy  because  of  the  nature  of  the  analog  processing  especially  in  the  optical  systems.  These 
accuracy  preplans  rise  fran  errors,  in  representing  and  reading  me  signal  using  the  electro-optic  i/0  devices. 
The  method  introduced  by  Caulfield1  (which  is  outlined  in  section  II  of  this  paper)  combines  the  high  speed  and 
parallelisn  of  the  optical  processor  and  the  high  accuracy  of  the  digital  ccmpiter,  using  Lord  Kelvin's 
iterative  method. z  In  section  II  of  this  paper  we  present  a  comparison  between  the  time  required  to  solve 
a  system  of  linear  equations  using  the  optical-hybrid  processor  to  that  required  by  the  digital  processor.  In 
section  III  we  present  a  numerical  analysis  of  the  convergence  of  the  solutions  for  a  linear  algebraic 
equations  as  a  function  of  the  condition  number  of  the  matrix  and  the  errors  in  representing  the  I/O  data  in 
the  optical  systan,  using  axputer  simulation  of  the  optical-hybrid  processor.  In  section  IV  a  conclusion  and 
final  remarks  are  drawn. 


II.  Computation  speed  analysis 

Tne  optical-hybrid  processor  works  in  the  following  manner  for  a  systan  linear  equations  (it  is  also  app¬ 
licable  bo  other  preplans-  both  linear  and  nonlinear) , 

A  x  =  b  ,  U) 

where  A  is  an  nxn  matrix,  x  and  b  are  nxl  vectors. 

a)  using  an  optical  analog  processor  we  can  calculate  an  approximate  solution  x°  of  the  linear  system,  the 
superscript  o' s  indicate  inaccuracies  in  the  optics  and  electronics,  so  the  equations  solved  by  the  optical 
processor  are 

A°  x°  -  b°  .  (2) 

b)  Remember  the  solution  to  a  high  accuracy  with  the  digital  computer.  Use  a  dedicated  digital  processor  to 
calculate  the  residue 

r»b-AJ^5»A(x-/3)=AiJx.  (3) 

c)  Use  the  optical  analog  processor  to  solve  the  linear  equations 

A0  £  ■  s  r°  ,  where  £  =  3  lx,  (4) 

for  x,  where  s  is  a  "radix",  or  scale  factor  chosen  to  make  a  good  use  of  the  dynamic  range. 

d)  Use  the  digital  processor  to  refine  the  solution  for  x  • 

x1  *  +  a  x  .  (5) 

If  the  refined  solution  x^  is  accurate  enough  terminate  the  iterations.  Otherwise  go  back  again  to  d) ,  c)  and 
d)  for  a  more  refined  solution  following  the  above  outlined  procedure. 
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To  gee  sane  quantitative  values  for  the  speed  of  this  process  corpared  to  that  carried  by  the  digital 
computer,  we  will  calculate  the  number  of  operations  required  by  each  method  then  trultiply  it  by  the  tine 
required  by  each  operation,  we  are  going  to  aonsider  the  rnxrber  of  operations  regardless  if  they  are  additions 
or  multiplications. 

Let  us  consider  an  nxn  matrix  A  ,  the  time  required  for  one  iteration  of  the  procedure  outlined  above, 

T01,  is  given  by 

T01  *  +  2n  (  n  +  1  )  TD1  ,  (6) 

where  *  the  time  required  to  solve  A  x  »  b  by  analog  optics, 
and  loi  “  the  time  required  to  make  one  digital  operation. 

Therefore  the  time  required  to  make  Iq  iterations  with  the  optical  processor  is  given  by 

Tq  »  Iq  {  T^  *  2n  (  n  +  1  )  Tqi)  .  (7) 

i*u le  the  time  required  by  the  digital  computer  to  solve  the  linear  equations  using  the  Cholesky's  method-* 
in  ID  iterations  takes  the  time,  Tq,  given  by 

V  (  n3  /  3  +  2  n2  )  I0  TQl  (8) 

The  condition  which  we  need  bo  satisfy  to  have  an  advantage  in  time  in  using  the  optical  processor  over  the 
digital  processor  is 

T0  «  Td  .  (9) 

Therefore,  for  a  clear  time  advantage  for  the  optical-hybrid  processor,  frcm  Bqs.  (7) -(9)  we  want  that 

V  TA1t2n(ntl|  TDl>  «  (  n3  /  3  +  2  n2  )  I0  TD1  (10) 

or 

k  (  Tm  +  2  n  (  n  +  1  )  T01 )  «  (  n3  /  3  +  2  n2  )  T0l  (11) 

where  k  *  Iq  /  Iq  .  Eq.  (11)  can  be  rewritten  in  the  following  form 

n3/3+2n2(l-k)-2kn  Tol  /  Trj.  »  1  .  (12) 

k 


The  advantage  of  using  the  optical-hybrid  processor  over  the  digital  processor  in  speed  is  obvious  from 
Eq.  (12) ,  and  it  increases  by  the  increase  of  the  size  of  the  matrix  n.  To  examine  this  oondition  very 
carefully,  let  us  rewrite  Eq.  (12)  in  the  following  form 


*pAI 


1  , 


where  .  . 

Ap  -  2  (  nJ/6  +  (  1  -  k  ) 


-  n  k  ]  /  k  ,  and 


*1  *  TD1  /  TA1 


(13) 

(14) 

(15) 


Here  Aj  is  an  "inherent  advantage".  A  single  analog  operation  is  much  faster  than  the  digital  one.  The  whole 
Ax-b  solution  will  be  slower  than  a  single  digital  operation,  but  the  analog  optical  Ax  *  b  solver  works  at 
speeds  independent  of  n.  On  the  otherhand,  Tq^  is  operation  dependent,  also  it  includes  the  time  in  perfor¬ 
ming  the  operation  and  in  storing  and  retrieving  the  data  frcm  the  msnory  of  the  computer  .which  is  a  time 
oonsuning  especially  with  the  increase  of  n. 

Ap  is  a  problem  related  advar  age,  it  a  function  of  the 
size  of  the  matrix  n  and  the  ratio  of  iterations  k.  The 
operation  advantage  A-,  is  ploted  in  Fig.  1  as  a  function 
of  n  and  k.  It  is  clear  that  Ap  increases  very  rapidly 
by  the  increase  of  the  size  n,  even  if  the  nuttoer  of 
iteration  in  the  optical-hybrid  processing  scheme  are 
nuch  larger  than  those  for  the  digital  processing, 
while  in  reality  they  will  be  approximately  the  same 
for  the  same  problem  conditions. 


of  k-1,10  and  20. 
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III.  Convergence  of  the  solution 


The  block  diagram  of  the  optical-hybrid  processor  is  shewn  in  Fig.  2.  The  solution  of  the  linear 
algebraic  equation  will  be  done  optically  using  the  method  introduced  by  Cheng  and  Caulfield4.  The  question 
0(  the  convergence  in  discussed  in  Cheng  and  Caulfield  paper  and  it  is  found  that  if  the  matrix  has  positive 
eigenvalues  than  the  solution  will  converge  regardless  of  the  si2e  of  the  rrv  rix.  This  sinply  applies  to  step 
C)  of  112  PROCEDURE  OUTLINED  IN  SECTION  II.  We  turn  next  to  the  total  process. 


PLANAR  WAVEGUIDES 


PHOTOOIOOE 

ARRAY 


Fig. 2  S/ste.i  layout  of  die  optical-hybrid  processor. 


In  this  section  of  the  paper  we  present  a  numerical  analysis  of  the  convergence  of  the  solution  and  its 
dependence  on  the  condition  n inter  of  the  matrix.  The  condition  number  of  the  matrix  A  is  defined  as 

X  (  A  )  »  j  |  A  I  H  (A*1)  1  (16) 

where  !  -  M  is  the  norm  of  the  matrix.  The  condition  number  is  a  measure  of  the  accuracy  of  the  A  x  *  b 
solutions.  The  larger  the  condition  number  the  less  accurate  the  result  achieved  with  any  fixed  accuracy 
acmputer.  In  this  paper  we  report  a  simulation  of  the  system  shown  m  Fig.  2  by  a  aerputer  algorithm  to  study 
the  convergence  of  the  solution  of  the  linear  equation.  The  computer  algorithm  simulates  the  analog  optical 
processor  and  the  electro-optic  I/O  devices  in  such  a  way  that  allows  us  to  control  the  errors  occur ing  in 
representing  the  matrix  oy  an  optical  mask,  and  also  the  error  in  reading  the  photodiode  voltage  and  in 
converting  the  input  in  the  system  to  light  by  the  LED's.  To  simulate  the  experimental  environment  we  have 
used  a  Gaussian  randan  number  generator  to  generate  the  error  signals. 

The  curve  shown  in  Fig.  3  is  the  result  of  a  simulation  experiment  for  the  optical-hybrid  processor  with 
the  following  characteristics :  The  matrix  A  can  be  represented  by  an  optical  mask  (  a  photographic  film  or 
a  soatial  light  modulator)  with  an  error  of  stadaxd  deviation  of  1%  of  the  maximum  coefficient  of  the  matrix. 
The  vector  x  can  be  reuc  with  an  error  of  standard  deviation  =  1%  of  tne  maximum  element  of  the  vector  x, 
ai<n  the  error  standard  deviation  m  representing  b  by  the  photodiode  is  1%.  From  Fig. 2  we  see  that  the 
solutions  converge  with  an  error  less  than  one  millionth  (  or  any  other  accuracy)  even  for  condition  number 
500.  For  aondition  numbers  less  than  250  the  number  of  iterations  required  are  less  than  20.  In  order  to 
guarantee  convergence  with  1%  accuracies,  we  must  restrict  matrices  to  condition  numbers  less  than  50. 

io  study  the  effect  of  the  error  in  representing  the  matrix  by  an  optical  mask  on  the  number  of  itera¬ 
tions  to  get  the  solution  within  10" 6  accuracy,  we  have  changed  the  standard  deviation  of  the  error  in 
representing  the  matrix  over  the  range  fran  1%  to  30%  for  a  (condition  number  150  and  we  calculated  the  number 
of  iterations  required  for  each  case.  The  relatioship  between  the  number  of  iterations  and  the  standard 
deviation  of  the  error  in  representing  the  matrix  is  plotted  in  Fig. 4.  As  the  error  increases  the  number  of 
iterations  increase  in  an  almost  linear  way.  Even  for  an  error  of  30%  in  representing  the  matrix,  the 
solution  still  (converges.  This  mtresting  result  proves  that  even  by  using  inaccurate  optics, optical-hybrid 
processor  can  still  solve  the  linear  system  of  equations  very  accurately 
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number  of  iterations  needed  for 
convergence  of  tne  solution  plotted 
vs.  tne  condition  number  of  A. 


Fig.  4 


Tne  nurooer  of  iterations  plotted  as  a 
function  of  tne  error's  standard  deviation 
in  the  matrix's  mask,  for  a  matrix  with 
a  oonuition  number,*150. 


The  aoreiition  number  is  one  of  the  determining  factors  of  the  speed  of  convergence  of  the  solutions  as 
can  be  seen  in  Fig. 4.  analler  aoniition  numbers  yield  faster  convergence  of  the  solution.  In  searching  for  a 
way  to  improve  the  condition  number  of  a  given  matrix,  we  found  one  way  of  doing  that  is  by  normalizing  the 
matrix  in  the  following  manner 


*il 


*il 


/  ( 


•it 


♦  a 


i2 


+  > 


1/2 


1.2 . .  n 


(17) 


wher  a  *s  are  the  coefficients  of  the  matrix  A.  This  normalization  decreases  the  value  of  the  condition 
numberl3of  the  matrix  which  in  turn  increases  the  speed  of  the  convergence  process.  Fig. 5  shews  a  plot  of 
the  aoreiition  number  before  arri  after  the  normalization  of  the  matrix,  from  which  we  can  see  an  improvement  in 
the  condition  number  after  normalization. 


Fig. 5  The  condition  number  of  the  matrix  after  it 
has  been  normalized  is  plotted  versus  the 
original  condition  number. 
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IV. _  Corel  us  ions 


C.ife  optical-nybnd  matrix  processor  discussed  in  this  paper  have  shown  very  promising  results,  it  i* 
clearly  very  canpa  table  in  both  speed  and  accuracy  with  the  digital  processor,  in  solving  a  system  of  ilrVRar 
equations.  T,ie  advantage  of  the  speea  of  the  processor  increases  with  the  increase  of  the  size  of  the  matr^ 
The  analysis  carried  out  in  this  paper  is  not  limited  to  the  solution  of  a  system  of  linear  equations  but  i«* 
applicable  as  well  to  other  linear  and  nonlinear  problems.  Another  interesting  result  presented  here  is 
the  Ratios  which  is  used  in  the  processor  can  have  a  tolerence  of  5  to  10%  without  sacrificing  the  accuracy 
of  the  solution,  although  it  is  shown  that  the  less  error  in  both  optics  and  electronics  the  faster  the 
solution  will  converge. 
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ABSTRACT 


An  algorithm  for  computing  the  eigenvalues  and  the  corresponding 
eigenvectors  of  a  matrix  using  the  bimodal  optical  computer  (BOC)  is 
presented.  Accuracy  of  the  solutions  are  similar  to  that  of  the 
digital  computer.  The  speed  of  the  computation  is  compared  to  the 
existing  super  computers.  The  BOC  is  shown  to  have  advantage  in  speed 
especially  for  large  size  matrices.  The  advantage  in  speed  increases 
by  the  increase  of  the  size  of  the  matrix. 


1 


I  INTRODUCTION 


Eigenvalue  problems  arise  in  many  physical  problems.  The 
eigenvalue  solutions  are  often  performed  by  iterarive  methods  using 
digital  computers.  1  Solving  this  class  of  problems  is  a  time 
consuming.  The  time  required  for  determining  the  eigenvalues  and 
eigenvectors  increases  with  the  size  of  the  matrix.  For  matrices  of 
very  high  rank  the  digital  computer  bocome  very  slow.  Optics  appears 
to  be  a  natural  candidate  for  tackling  such  a  class  of  problems. 
Previous  work  on  optical  eigenvalue  processors  offers  potential 
accuracy  problems .  2 '  3  ' 4 

In  this  paper  we  introduce  a  method  to  determine  the  eigenvalues 
and  their  corresponding  eigenvectors  for  a  positive  definite  matrix 
using  the  bimodal  optical  computer  (BOC) .  5,6  The  accuracy  of  the 
solution  is  equivalent  to  that  of  a  floating  point  processors  because 
of  the  hybrid  nature  of  the  BOC  provided  that  convergence  occurs  at 
all.  The  method  is  outlined  in  section  II.  The  speed  of  the  algorithm 
is  analyzed  in  section  III.  Conclusions  are  drawn  in  Section  IV. 

II  EIGENVALUE  ALGORITHM 

For  an  nxn  matrix  A  the  eigenvalues  and  the  eigenvectors  are  given 
by 


A  e  i  = 


M  e  i  . ,n 


(1) 


2 


where  Xt's  are  the  eigenvalues  of  the  matrix  A  and  e^'s  are  the 
corresponding  eigenvectors.  In  this  paper  we  are  going  to  consider  the 
case  where  the  eigenvalues  of  the  matrix  are  all  positive,  real  and  not 
equal, i.e. 


>  *2>  *3> . >  *n>0  •  (2) 

There  are  many  methods  for  determining  the  eigenvalue  and 
eigenvectors  of  a  matrix.  One  of  the  powerful  methods  is  the  inverse 
iteration  method.  4,7  The  inverse  iteration  method  is  outlined  as 
follows: 

a)  Assume  an  initial  value  for  the  eigenvalue  X  t  =  q  {  and  an 
eigenvector  z‘0>.  The  assumption  for  the  initial  value  of  the 
eigenvalue  can  be  done  using  the  Gershgorin  circle  theorem. 

b)  Then  solve  the  system  of  linear  equations 

(A-q  jl)  y<P*l>  =*  ,  p=*0 ,1,2,3, .  (3) 

where  ztp>l>=»y,P+l>/Uy<P+1>||0>,  (4) 

||  .  Ha  is  the  infinite  norm,  8  and  I  is  the  identity  matrix. 

As  p  ®  y  1  P  >=e  i  and  1  /  ||y<p>  |l®  =  '  fq  i-  (5) 


3 


Of  course  other  norms  will  work  and  even  work  somewhat  better,  but  the 
infinite  norm  is  very  easy  to  calculate. 

The  time  consuming  operation  in  this  method  is  solving  the  system 
of  linear  equations  in  (3)  for  y  This  system  of  equations  can  be 

solved  using  the  bimodal  optical  computer  (BOC)  very  rapidly  relative 
to  electronics,  especially  for  large  n.  5  The  algorithm  which  we 
propose  in  this  paper  for  determining  the  eigenvalues  and  the 
eigenvectors  using  the  BOC  is  as  follows: 

a)  Assume  a  value  for  qt  and  2*  01  using  the  digital  processor. 

b)  Solve  the  linear  system  of  equations 

(A-q  il)  y  1  n  =  z  <0>  (6) 

for  y<X)  using  the  BOC. 

c)  Compute  the  norm  ||y  <n  ||®and  z,n  using  a  dedicated  digital 
processor. 

d)  If  ||y  *  1 1  ||  a,  —  | |y  ‘  0  1  ||  w  <  t  ,  where  e  is  the  error  acceptable  in 
computing  the  eigenvalues  and  the  eigenvectors,  then  stop  the 
iterations  otherwise  go  back  to  step  b) . 

In  this  algorithm  we  use  the  analog  optics  to  compute  an 
approximate  solution  for  the  system  of  linear  equations  which  is  then 
refined  using  the  digital  processor.  This  refined  solution  has  the 
digital  computer  accuracy  but  determined  much  faster.  This 
computation  is  done  using  the  BOC  which  is  shown  in  Fig.l.  The 
convergence  of  the  solution  of  the  system  of  linear  equations  using 
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the  BOC  is  discussed  in  the  paper  of  Abushagur  and  Caulfield.  6 


III  Speed  Ql  the  Algorithm 

In  this  section  we  present  a  comparison  between  the  speed  of  the 
digital  computer  in  determining  one  eigenvalue  for  the  matrix  A 
to  that  of  the  bimodal  optical  computer. 

The  time  required  for  doing  one  iteration  of  the  procedure  outlined 
in  Sec. II  using  the  digital  computer  is  8 

T  5  =*  (  7n  3  /4)T01,  (7) 

where  T  D  x  is  the  time  required  for  one  digital  operation.  The  time 
required  to  do  one  iteration  ,  T0,  using  the  BOC  is  given  by 

To3  [T  Al  +  2n(n+2)  TDl]  I  o,  (8) 

where  T  A  t  is  the  time  required  to  solve  the  system  of  linear  equations 
by  the  analog  processor  and  I  0  is  the  number  of  iteration  required  in 
refining  the  solution  of  the  system  of  linear  equations  using  the  BOC. 
For  a  clear  advantage  in  speed  for  the  BOC  over  the  digital  computer  we 
need  to  satisfy  the  following  condition. 


<c  7n3/4  -  2n(n+2)  I  ,]  A  o)  <TdiAm>  >)  1. 


(10) 


Eq. (10)  can  be  rewritten  as 

Ap  •  Aj  ))  1.  (11) 

where  A  j=T  o  i A  » i  and  A  p=[7n  3/4-2n(n+2)  I  0]  /t  0.  (12) 

T  D1  and  T  are  independent  on  the  size  of  the  matrix.  For  a  rough 
comparison 

T  A  xz  2  tsec,  (13) 

and 

T  D1:  1  usee,  for  a  typical  microcomputer  and,  (14) 

:  1  nsec,  for  a  CRAY2 .  (15) 

If  we  substitute  fromEqs.  (13)  and  (15)  into  Eq.  (11)  the  condition  for 
the  advantage  in  speed  for  the  BOC  over  the  CRAY 2  will  be 

A  p))  2000  .  (16) 

In  Fig. 2  A  PA  x  is  plotted  as  a  function  of  the  size  of  the  matrix  n  usin 

T  dj  of  the  CRAY2  computer.  It  is  clear  that  the  BOC  can  have  an 

advantage  of  speed  over  the  CRAY 2  if  the  size  of  the  matrix  is  in  the 


range  of  50  or  larger ..  This  advantage  in  speed  increases  by  the 
increase  of  the  size  of  the  matrix.  Which  makes  this  method  very 
attractive  for  such  a  class  of  problems. 


IV  CONCLUSION 

A  new  method  for  solving  the  eigenvalue  problem  using  the  bimodal 
optical  computer  is  presented.  It  is  shown  that  for  a  well  conditioned 
matrix  the  solution  for  the  eigenvalues  and  eigenvectors  can  be 
achieved  much  more  faster  using  the  BOC  than  the  existing 
supercomputers.  This  advantage  in  speed  becomes  very  clear  for  large 
size  matrices. 
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FIGURE  CAPTIONS 


Fig.l  Block  diagram  for  the  bimodal  optical  computer  (BOC) 
for  solving  the  system  of  linear  equations  Ax=b. 

Fig. 2  The  speed  advantage  Ap  At  for  the  BOC  over  the  CRAY2 
in  solving  the  eigenvalue  problem. 
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ABSTRACT 

Hardware  and  software  design  of  the  Bimodal  Optical  Computer  (BOC)  and  its 
implementations  are  presented.  Experimental  results  of  the  HOC  for  solving  a  system  of  linear 
equations  Aj  =  b  is  reported.  The  effect  of  calibration,  the  convergence  reliability  of  the  BOC, 
and  the  convergence  of  problems  with  singular  matrices  are  studied. 

1. INTRODUCTION 

Analog  optical  systems  are  becoming  very  attractive  in  the  area  of  signal  processing  because 
of  their  ability  to  process  in  parallel  two  dimensional  data  very  rapidly.  However,  analog  optical 
systems  have  low  accuracy.  BOC  [1-4]  solves  this  low  accuracy  problem,  by  using  a  combination 
of  both  analog  optical  system  and  aigital  processor. 

In  this  paper  we  present  experimental  results  using  BOC  for  solving  systems  of  linear 
equations.  In  Section  2  a  comparison  between  astigmatic  optics  and  waveguides  based  algebra 
processors  is  presented.  The  hardware  and  the  software  design  of  BOC  is  in  Section  3.  Section  4 
contains  the  experimental  results  of  the  BOC  for  solving  a  system  of  linear  equations.  The 
conclusions  are  in  Section  5. 

2.ASTIGMATIC  OPTICS  AND  WAVEGUIDES  BASED  ALGEBRA  PROCESSORS 

The  analog  optical  system  can  be  applied  in  many  applications.  This  paper  concentrates  on 
solving  a  system  of  linear  equations.  Goodman  [5]  has  introduced  an  astigmatic  processor  to 
perform  matrix  vector  multiplications,  which  can  also  be  used  in  a  system  of  linear  equations 
solver.  However,  the  main  problem  that  faces  the  arrangement  in  Fig.  1  is  aligning  the 
:omponents,  to  insure  a  uniform  light  distribution  along  the  matrix  plane. 

Waveguides  can  be  used  to  build  optical  algebra  processors.  By  using  waveguides,  the 
optical  system  can  be  made  compact,  and  its  alignment  will  be  much  easier  than  that  of  the 
astigmatic  system.  The  distribution  of  the  light  across  the  waveguide  is  plotted  in  Fig.  2,  which 
shows  that  the  light  is  almost  uniform  along  the  waveguide.  From  the  practical  standpoint 
waveguides  are  more  reliable  to  use  in  these  systems  than  the  astigmatic  optics. 
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3.THE  BQC  DESIGN  ^HARDWARE  AND  SOFTWARE) 

3.1  BOC  HARDWARE  DESIGN 

The  BOC  hardware  has  three  main  parts  as  shown  in  Fig.  3.  The  optical  system,  the 
electronic  circuit,  and  the  digital  processor.  The  optical  system  consists  of  the  fully  parallel 
matrix-vector  multiplier.  Light  from  the  LED's  representing  he  x  components  are  spread 
vertically  by  planner  waveguides  onto  the  columns  of  the  matrix  mask.  The  transmitted  light  is 
summed  row  wise  by  using  another  set  of  planner  waveguides  and  detected  by  photodiodes  which 
represent  the  output  vector  h- 

The  electronic  circuit  acts  as  a  feedback  loop  to  correct  for  the  input  light  of  the  LED's, 
until  a  solution  is  reached.  The  solution  x  will  then  be  read  and  stored  by  the  digital  processor. 
Fig.  4  shows  the  electronic  circuit  used  for  the  feedback  loop. 

The  A/D  and  D/A  conversion  from  and  to  the  electronic  circuit  are  performed  by  the  digital 
processor. 

3.2  BOC  SOFTWARE  DESIGN 

The  BOC  software  controls  the  Input/Output  operations.  Both  the  matrix  A  and  the  output 
vector  h  are  read  and  stored  by  the  digital  processor.  Tue  vector  £  is  then  converted  to  analog 
voltage  by  a  D/A  converter,  and  it  is  assigned  to  the  different  ports  of  the  electronic  circuit.  The 
analog  optical  processor  solves  for  an  approximate  solution  due  to  its  inaccuracy.  The  digital 
processor  reads  and  stores  the  approximate  solution,  x®  through  the  A/D  converter,  then  it 
calculates  the  residue  vector,  z,  as, 

I=fc-Ax0=A(x-x°)=AAx  (1) 

Multiply  Eq.  (1)  by  a  scalar  a  to  make  use  of  the  whole  dynamic  range  of  the  system,  so 
Eq.(l)  becomes, 

sr=A(aAx)  (2) 

If  the  residue  is  not  small  enough,  the  system  of  linear  Eq.(2)  will  be  solved  for  Ax  using  the 
analog  optical  processor  and, 

X1— X°+As  (3) 

A  new  residue  will  be  found  for  j1.  The  iteration  process  is  continued  by  solving  Eqs.(l) 
through  (3)  until  a  satisfactory  solution  is  reached. 

4. EXPERIMENTAL  RESULTS 

In  this  section  we  present  the  experimental  results  for  solving  a  system  of  linear  equations 
Ax=jj  using  the  BOC, where  A,]j,  and  x  are  all  positive. 

The  Log  of  the  error  and  that  of  the  residue  are  plotted  versus  the  number  of  iterations.  The 
error  and  the  residue  are  defined  as, 

Error=  ll(x-xk)ll/||xll  (4) 
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Residue=|]ik||  (5) 

Where  ||.J|,  is  the  Enclidean  norm,  x  is  the  exact  solution,  *k  is  the  k^»  iteration  solution, 
and  £k  is  the  k“  iteration  residue. 

Since  we  are  dealing  only  with  positive  numbers  in  this  paper,  we  used  the  absolute  value  of 
r  to  solve  Eq.(2),  then  we  set: 

x(n+1)==x(n)+As  (6) 

when  all  the  components  of  r  are  positive.  And 

(7) 

if  all  the  components  of  r  are  negative.  We  reject  the  iteration  when  the  components  of  r  have 
different  signs  and  take  the  previous  one.  By  rejecting  some  iterations  we  are  actually  rejecting 
some  corrections.  This  procedure  slows  down  the  convergence  process. 

In  all  the  experiments  performed,  the  iteration  process  is  stopped  when  a  16  bit  accuracy  is 
reached.  Fig.  5  shows  that  the  BOC  started  with  almost  30%  error  and  it  needed  6  iterations  to 
converge  to  16  bit  accuracy.  In  Fig.  6(a)  BOC  started  with  almost  110%  error,  and  the  number  of 
iterations  needed  was  21.  Fig.  6(b)  shows  the  Log  of  the  residue  as  a  function  of  the  number  of 
iterations.  The  fluctuations  depicted  by  Figs.  6(a)  and  (b)  is  due  to  the  rejection  method  used  in 
the  experiments. 

4.1  EFFECT  OF  CALIBRATION 

The  analog  optical  system  error  is  a  major  factor  in  the  rate  of  convergence  of  the  BOC.  If 
that  error  is  reduced,  then  the  convergence  is  much  faster.  In  order  to  illustrate  this,  the  same 
problem  has  been  solved  twice  with  two  different  accuracies  of  the  optical  system.  The  analog 
optical  system's  error  in  the  first  time  was  50%,  and  it  was  30%  in  the  second  time.  Twenty  one 
iterations  were  needed  by  BOC  to  converge  to  the  16  bit  accuracy  for  the  first  case.  For  the 
second  case  the  number  of  iterations  was  reduced  to  12.  These  results  are  plotted  in  Fig.  7. 

4.2  RELIABILITY  OF  THE  SYSTEM 

System  reliability  for  convergence  have  been  tested  and  verified  by  solving  the  same  problem 
several  times,  under  different  conditions.  Results  show  that  when  the  BOC  is  used,  to  solve  a 
problem  several  times,  the  convergence  rate  will  not  be  exactly  the  same  for  all  the  cases. 
However,  the  number  of  iterations  needed  by  the  BOC  to  converge  to  a  certain  accuracy  is  almost 
the  same.  Fig.  8  shows  three  different  paths  of  convergence  for  the  same  problem.  The  BOC 
needed  13  iterations  in  the  first  run,  14  iterations  in  the  second,  and  11  in  the  third. 

4.3  SOLUTION  CONVERGENCE  FOR  THE  SINGULAR  MATRIX  SYSTEM 

Solving  a  system  of  linear  equations  with  a  singular  matrix  A  is  one  of  the  problems  that 
cannot  be  solved  using  conventional  digital  computer  techniques.  Singular  matrices  have  a 
condition  number  equal  to  infinity,  so  their  inverse  does  not  exist,  also  they  have  infinite  number 
of  solutions.  However,  the  BOC  can  be  used  to  solve  such  systems  [6].  The  BOC  converges  much 
faster  when  A  i3  singular,  because  a  nonsingular  matrix  will  have  a 
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uniaue  solution.  Due  to  the  infinite  solutions  that  a  singular  matrix  has,  the  BOC  produces 
different  solution  each  time  we  try  to  solve  the  same  problem  again.  Fig.  9  shows  the  BOC 
convergence  for  a  singular  matrix. 


5.CONCLUSIONS 

The  BOC  system  was  built  and  experimentally  tested.  The  experimental  results  show  great 
reliability  of  the  processor  in  solving  systems  of  linear  equations.  Overall  16  bit  accuracy  of  the 
hybrid  system  was  achieved  with  an  analog  optical  system  of  30%  to  50%  error.  Higher  accuracies 
of  the  solution  can  be  obtained  by  increasing  the  number  of  iterations.  The  BOC  also 
demonstrated  to  solve  systems  of  linear  equations  with  singular  matrices. 

We  are  considering  in  future  work,  bipolar  numbers,  complex  numbers,  and  using  SLM  for 
the  matrix  mask. 
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Fig  5.  The  Log(error)  as  a  function  of 
the  number  of  iterations.  The  BOC  started 
with  30%  error. 
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Fig  6(b).  The  Log(residue)  as  a  function  of 
the  number  of  iterations. 


Fig  6(a).  The  Log(error)  as  a  function  of 
the  number  of  iterations.  The  BOC  started 
with  100%  error. 


Fig  7.  The  Log(error)  as  a  function  of  the 
number  of  iterations  for  the  same  problem, 
but  with  two  different  accuracies  of  the 
optical  system. 


Fig  9.  The  Log(error)  as  a  function  of  the 
number  of  iterations  .The  same  problem 
done  twice  for  a  singular  matrix  A. 
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APPENDIX  B 


MASSIVE  HOLOGRAPHIC  INTERCONNECTION 

The  basic  concept  was  described  in  late  1 987  (Appl.  Opt. 
26,  4039).  An  extension  soon  followed  (Lasers  &  Optronics, 
1989).  This,  in  turn,  was  followed  by  a  detailed  analysis  of  our 
concept  and  1988  reinventions  of  it  in  the  U.S.,  England,  and 
Korea  (Appl.  Opt.  28,  311).  A  book  chapter  on  this  subject  is 
now  under  preparation. 
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While  full  optical  interconnects  of  an  N  X  N  input  signal 
array  to  an  N  X  N  output  signal  array  through  N 4  weighted 
interconnects  is  an  important  goal  for  optical  artificial  neural 
systems  (ANSs),  methods  for  doing  this  are  rare.  Goodman 
et  al.1  fully  connected  an  N  X  1  array  to  a  1  X  N  array. 
Sawchuk2  has  suggested  a  fixed  N*  interconnection  method 
using  replicated  holograms  for  optical  cellular  logic.  This 
works  in  principle  but  has  extreme  space-bandwidth  re¬ 
quirements  for  large  N.  Sawchuk  has  described  a  3-D  dy¬ 
namic  interconnection  network  for  interconnecting  2-D  N  X 
N  arrays  in  parallel  computing,  but  this  network  does  not 
have  arbitrarily  variable  weights.3  I  hope  to  show  a  simple 
optical  N*  interconnection  method  which  uses  only  one  non- 
critical  lens,  anWxN  reflective  spatial  light  modulator  and 
a  beam  splitter  as  components. 

It  is  convenient  to  think  of  the  N  X  N  input  array  as  a 
matrix  A  with  components  an.  Likewise  the  output  is  an  N 
X  N  array  B  with  components  6,r  These  are  interconnected 
by  a  4-D  tensor  T,  i.e., 

B  -  TA  (1) 

Equivalently, 

bo  ■  ^2  (2> 

*  7 

Let  us  donote  oy  T„  the  N  X  N  array  of  T,,*,  elements 
arranged  in  the  same  way  as  the  akl  elements.  That  is,  the 
tensor  T  can  be  thought  of  as  N2  different  N  X  N  weight 
arrays  of  the  form  T,„  where  T,,  is  an  N  X  N  array  of  T 
elements  needed  for  Eq.  (2).  Dropping  the  subscript  kl  from 
Tt/*/  to  T,;  is  done  for  clarity  in  the  following. 

Figure  1  shows  the  basic  scheme.  The  A  matrix  is  inserted 
at  the  right  side  into  the  optical  system  via  a  reflective  spatial 
light  modulator  (SLM).  An  N  X  N  hologram  array  (which 
may  be  so  large  that  it  needs  to  be  demagnified  by  relay 
optics  before  use  as  shown  in  Fig.  1)  is  illuminated  by  a 
reconstruction  beam  and  provides  the  N*T,j  arrays.  In  Fig. 

I,  we  see  that  the  T,,  arriving  at  the  reflective  SLM  are  IV2 
products  of  the  form  T,,*(a*(.  These  are  collected  in  the  B 
plane  (the  image  of  the  hologram  array). 

In  practice  it  may  be  necessary  to  make  minor  modifica¬ 
tions  on  the  apparatus  of  Fig.  1.  The  hologram  array,  SLM, 
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Figure  I.  Configuration  of  a  parallel  optical  interconnection  be¬ 
tween  an  NX  N  input  array  and  A  and  an  N  X  N  output  B.  An  N  x 
N  array  of  holograpms  each  containing  and  A'  x  ,V  mask  array  is 
illuminated  in  parallel  to  produce  the  N*  interconnection. 


and  detector  array  may  be  of  different  sizes.  This  requires 
relay  lenses  to  magnify  or  demagnify  one  or  more  of  these  to 
achieve  the  Fig.  1  configuration  or  some  simple  variant  of  it. 
For  example,  the  hologram  array  may  be  quite  large.  As¬ 
suming  a  2-mm  diam  hologram  to  store  a  1000  X  1000  T,, 
mask,  we  need  a  2-  X  2-m  hologram  array  to  store  1012 
weights.  This  certainly  precludes  some  uses.  A  500  X 
500T„  mask  needs  a  1-mm  hologram,  and  we  only  need  a  50- 
X50-cm  array  to  store  (500)4  =  67.25  X  1010  weights.  Fresnel 
diffraction  considerations  make  it  desirable  to  keep  the  holo¬ 
grams  larger  than  or  equal  to  .  mm.  Thus  if  we  drop  to  a 
128  X  128T,„  we  need  a  12.8-  X  12.8-cm  array  to  store  the 
(128>4  -  (27)4  =*  228  X  2.5  X  108  weights. 

To  record  each  subhologram  we  must  reverse  Fig.  1.  A 
point  source  at  the  i,  j  position  in  the  B  plane  illuminates  the 
SLM.  The  T,,  pattern  is  written  onto  the  SLM.  A  coherent 
reference  beam  conjugate  to  the  Fig.  1  reconstructing  beam 
allows  the  subhologram  to  be  recorded. 

Optical  parallel  N 4  interconnections  are  seen  to  be  quite 
straightforward.  No  technology  breakthroughs  are  required 
to  achieve  N  =  103  or  N*  =  1012.  Recording  the  master 
hologram  as  a  whole  or  in  parts  may  prove  slow,  but  mass- 
produced  copies  can  be  made  quickly  and  inexpensively. 


This  work  was  sponsored  primarily  by  the  Department  of 
the  Navy  under  contract  N00014-86-K-0591. 
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A  Breakthrough  For  Optical  Neural  Nets 


Figure  t.  One  trillion  weighted,  parallel  interconnections  for  optical  neural-net 
computing  tasks  can  be  accomplished  with  the  above  design.  Data  input  Is  achieved 
with  an  NxW  SLM  input  array.  Each  element  of  the  input  array  is  combined  with  an 
/VxjV  weighted  array  from  an  /V  x  /V  holographic  array  into  an  /VxiV  output  array. 


Since  the  advent  of  the  laser,  a  quiet 
struggle  has  been  going  on  between  optics 
and  electronics.  For  a  century  it  seemed 
that  Maxwell’s  equations  were  the  only 
laws  these  two  fields  had  in  common. 
Now,  however,  diode  lasers  and  thousands 
of  kilometers  of  fiberoptic  cable  are  stag¬ 
ing  a  virtual  takeover  in  telecommunica¬ 
tions,  erasable  optical  disks  are  edging  in 
on  conventional  magnetic  storage,  and 
even  the  hallowed  halls  of  electronic  com¬ 
puters  are  being  infiltrated  by  new  opto¬ 
electronic  techniques. 

The  latest  optoelectronic  coup  is  being 
staged  by  H.J.  Caulfield,  director  of  the 
Center  for  Applied  Optics  at  the  Univer¬ 
sity  of  Alabama  in  Huntsville.  Caulfield 
has  put  forth  an  “existence  proof  that 
he  says  shows,  “that  there  is  a  vital  task 
in  computing  that  optics  can  do  now  and 
electronics  can  never  do.”  What’s  more, 
he  says  it  can  be  done  with  technology  that 
has  been  around  for  twenty  yean.  If  true, 
this  is  the  breakthrough  that  optical  com¬ 
puting  has  been  waiting  for.  As  Caulfield 
puts  it,  "I  think  what  we’re  seeing  is  the 
real  birth  of  optical  computing.” 

(n  an  interview  with  LAO  on  August  19, 
Caulfield  revealed  that  optical  techniques 
offer  the  only  possible  solution  to  the 
massive,  parallel,  weighted  interconnec¬ 
tions  of  neural  networks.  Neural  networks. 


or  "neural  nets,”  form  the  foundation  of 
a  computer  architecture  designed  to  mimic 
the  human  brain  by  forming  millions,  even 
trillions,  of  individual,  parallel  intercon¬ 
nections.  As  with  neurons,  all  these  inter¬ 
connections  could  be  individually  weighted 
and  connected  to  an  equal  number  of  out¬ 
puts.  Such  an  architecture  was  developed 


"I  think  what  we're  seeing 
is  the  real  birth  of 
optical  computing." 


by  Warren  McCulloch  as  long  ago  as  the 
mid  1920s,  but  has  only  recently  been 
studied  as  a  possible  solution  to  highly 
complex,  repetitive  computing  problems 
requiring  high-speed  solutions,  such  as 
pattern  recognition. 

Caulfield  used  reductio  ad  absurdum  to 
prove  the  futility  of  making  1011  parallel, 
weighted  interconnections  electronically. 
He  explained  that  since  electrons  inter¬ 
act  with  one  another,  the  connections 
would  all  have  to  be  made  with  individual 
“wires”  or  electron  carriers.  Submicron 
carriers  on  silicon  chips  have  just  become 
possible,  so  it  is  not  inconceivable  that 
such  carriers  could  be  made  and  packed 


together  as  closely  as  1  micrometer  by  th 
next  century.  For  I01J  connections,  th* 
wires  would  have  to  be  packed  together  u 
a  two-’  mensional  array  1  x  1  meter,  anc 
then  rearranged  somehow  to  form  the 
interconnections,  which,  according  to 
Caulfield,  would  require  wire  lengths  of 
10  meters  or  so.  That  leads  to  a  10-m1 
conductive  mass  with  bothersome  induc¬ 
tance  and  crosstalk. 

Furthermore,  even  if  the  conductivity 
(weights)  of  all  these  wires  could  be  set 
independently  with  no  space,  time,  or  cost 
penalty,  each  wire  must  be  connected  to 
an  input  and  an  output.  When  added  to 
the  interconnections,  this  leaves  4x  10>z 
attachments  to  make.  If  4,000  connections 
could  be  made  every  second,  it  would  take 
10*  seconds  to  complete  the  task,  which 
adds  up  to  something  over  25  millennia. 

In  a  paper  submitted  to  Applied  Optics. 
Caulfield  suggests  an  optical  method  of 
accomplishing  the  same  thing.  The  tech¬ 
nique  makes  use  of  holographic  technology 
from  the  1 960s— page-oriented  holo¬ 
graphic  memories.  Figure  1  shows  how 
it  works. 

To  produce  10u  optical  interconnections, 
input  data  is  encoded  onto  a  10*  x  10* 
array  in  the  form  of  a  transmissive  spatial 
light  modulator  (SLM);  although  a  reflec¬ 
tive  SLM  can  also  be  used.  Each  element 
of  the  the  SLM  input  nay  can  be  assigned 
a  set  of  weighted  alues  by  means  of  a 
large  holographic  sTay. 

The  holographic  array  is  10*  holograms 
high  and  10*  holograms  wide.  Each  of 
the  elemental  holograms  in  this  array  is 
made  in  such  a  way  as  to  produce  a 
lO*  x  10*  pattern  onto  the  10*  x  10*  SLM 
input  an-ay.  So  the  holographic  array 
responsible  for  assigning  the  weighted 
interconnections  can  be  thought  of  as  Nz 
different  NxN  weight  arrays,  and  can 
be  represented  by  a  four-dimensional 
tensor  T,)k|. 

When  the  weighted  arrays  of  holograms 
are  reconstructed  with  a  reference  beam 
and  imaged  onto  the  lO*  x  10*  SLM  input 
array,  a  10*  x  10*  output  array,  B,  is  pro¬ 
duced.  B  *  TA  and  has  elements 

bn  *  E  Tl|Ua«. 

’  *j 

A  2  x  2-meter  holographic  array  consist¬ 
ing  of  (10»>z  holograms,  with  each  2-mm- 
diameter  hologram  storing  a  tO’xio* 
weight  array,  would  yield  10u  weighted 
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interconnections  when  combined  with  the 
103  x  103  SLM  input  array. 

It  is  impossible  to  fathom  the  effects  this 
development  will  have  on  optical  neural- 
net  design,  but  Caulfield  has  managed  to 
combine  mindboggling  complexity  with 
stupefying  simplicity.  He  suspects  that  the 
most  time-consuming  part  of  his  optical 
neural-net  design  will  be  learning  how  to 
weight  the  input  array  for  each  optical 
computing  task.  This,  he  believes,  could 
take  months  to  years  and  require  the  learn¬ 
ing  capacities  of  traditional  electronic 
computers.  Once  a  master  weight  array  is 
produced,  however,  neural-net  operation 
time  should  be  in  the  realm  of  millisec¬ 
onds,  and  successfully  “programmed" 
master  holograms  could  be  cloned  in  sec¬ 
onds  for  mass  production. 

“The  role  of  electronics,  with  its  great 
flexibility  and  accuracy,  is  learning.  The 
role  of  optics  is  doing  what  electronics 
learns,”  says  Caulfield.  “Brains  use  the 
same  equipment  for  both  tasks,  but  why 
should  we?  We  should  let  optics  and  elec¬ 
tronics  do  what  each  does  uniquely  well. 
The  war  between  optics  and  electronics  is 
a  foolish  one.  They  each  have  a  major 
role.  What  we  have  done  is  [to]  show,  in 
one  vital  area,  what  those  roles  are." 

—  Tom  Higgins 


Can  Superconductors 
Replace  Fiberoptics? 

Could  the  new  high-temperature  super¬ 
conductors  provide  much  greater  trans¬ 
mission  bandwidth  than  fiberoptics  for 
long-distance  communications?  The  recent 
demonstration  that  such  materials  can 
transmit  picosecond  pulses  has  some  ob- 

The  experiments  are  clear 
indication  of  high  potential 
bandwidth  for  the  new 
superconductors. 

servers  believing  so.  However,  others  are 
more  cautious,  noting  that  the  two  key 
experiments  sent  picosecond  pulses  through 
only  five-millimeter  lengths  of  thin-film 
superconductor. 

The  two  experiments  were  announced 
nearly  simultaneously.  One  was  by  a  .earn 
from  the  University  of  Rochester  and 
Cornell  University,  the  other  by  a  team  at 
the  IBM  T.  J.  Watson  Research  Center  in 
Yorktown  Heights,  N.Y.  Both  were  work¬ 
ing  with  thin  films  of  yttrium  barium 


copper  oxide,  the  best-known  member  of 
the  family  of  new  materials  that  are  super¬ 
conducting  at  temperatures  of  90  or  100 
Kelvin.  Both  passed  picosecond-domain 
eiictncal  pulses  through  5-mm  lengths  of 
superconducting  film  and  could  detect  no 
pulse  distortion  or  dispersion  over  that 
scale.  Both  involved  researchers  well 
known  for  their  work  on  ultrashort  opti¬ 
cal  pulses. 

However,  the  two  experiments  differ  in 
detail,  and  the  two  groups  differ  radically 
in  how  far  they  are  willing  to  extrapolate 
the  results.  Gerard  Mourou,  director  of  the 
ultrafast  science  center  at  Rochester’s 
Laboratory  for  Laser  Energetics,  pre¬ 
dicted,  “over  distances  of  miles,  lossless 
superconducting  transmission  lines  with 
100  times  the  capacity  of  optical  fiber 
systems  could  be  developed."  Much  more 
cautious  was  Alex  Malozemoff,  research 
division  coordinator  for  superconductivity 
at  IBM,  who  said,  “I  don’t  think  we’ve 
yet  tested  the  superconductors  in  a  regime 
to  talk  about  long-distance  transmission." 

The  thin-film  superconductor  used  in 
the  Rochester-Cornell  experiments  was 
deposited  on  a  zirconium-oxide  substrate 
by  Robert  Buhrman,  professor  of  applied 
and  engineering  physics  at  Cornell.  The 
submicrometer  film  was  etched  to  form  a 
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Massive  holographic  interconnection  networks  and 
their  limitations 


Joseph  Shamir,  H.  John  Caulfield,  and  R.  Barry  Johnson 


Fundamental  and  practical  limitations  to  be  encountered  in  the  implementation  of  massive  free  space  optical 
interconnects  are  discussed  in  detail,  and  some  improved  architectures  are  proposed.  The  long  term 
optimum  design  uses  currently  unavailable  large  arrays  of  laser  diodes.  An  interim  solution,  using  available 
spatial  light  modulators,  is  shown  to  be  capable  of  storing  ~1010  bits  of  information  and  performing  ~10il 
interconnections/s. 


I.  Introduction 

There  is  an  increasing  interest  in  massive  optical 
interconnection  networks  for  incorporation  in  commu¬ 
nication  and  signal  processing  systems.  1-8  Optical  ar¬ 
chitectures  are  particularly  attractive  for  the  imple¬ 
mentation  of  interconnection  networks  with  extremely 
high  complexity  that  are  impractical  with  convention¬ 
al  electronic  systems.  Neural  networks7*22  that  are 
based  on  massive  weighted  interconnections  are  good 
examples  of  such  systems.  Many  of  the  architectures 
considered  in  the  above-mentioned  references  employ 
the  extensive  interconnectivity  available  in  free  space 
propagation  of  light  waves  Only  a  few  of  these  publi¬ 
cations  have,  however,  seriously  discussed  the  actual 
feasibility  of  large  scale  implementation.21011  A  more 
common  attitude  is  the  description  of  a  system  archi¬ 
tecture  witn  a  statement  on  the  expected  performance. 
Sometimes  a  demonstration  is  presented  with  a  small 
array  of  input  data,  but,  in  many  cases,  the  limitations 
imposed  on  the  upscaling  possibilities  are  ignored. 
Several  limitations  stem  from  fundamental  physical 
processes  such  as  diffraction  and  coherence,  while  oth¬ 
ers  are  due  to  technical  difficulties  such  as  the  angular 
dependence  of  spatial  light  modulators  (SLMs)  and 
the  actual  shift  variance  of  real  spatial  filters.23 

The  main  objectives  of  this  work  are  the  analysis  of 
the  degradation  factors  that  limit  the  performance  of 
practical  interconnection  networks  and  the  derivation 
of  fundamental  and  technical  constraints  on  the  real- 
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ization  of  an  actual  physical  system.  The  analysis  is 
based  initially  on  a  recently  proposed  architecture," 
and  two  additional  configurations  are  introduced  with 
an  attempt  to  overcome  some  of  the  difficulties.  Al¬ 
though  we  treat  here  a  very  specific  application,  the 
results  are  relevant  to  a  considerable  variety  of  other 
optical  processing  architectures  that  have  been  pro¬ 
posed  in  the  past  or  will  be  proposed  in  the  future. 

In  the  next  two  sections,  we  describe  the  anticipated 
performance  of  an  ideal  system  disregarding  all  con¬ 
straints  that  will  be  analyzed  in  Secs.  IV-VII  in  detail. 
Part  of  this  analysis  is  based  on  a  more  exact  mathe¬ 
matical  description  of  the  whole  process  given  in  Ap¬ 
pendix  A  for  a  scalar  paraxial  approximation.  For 
most  parts  of  the  analysis  we  assume  that  the  system 
must  perform  all  possible  interconnections  among  all 
the  channels  available  and  base  the  calculations  on 
worst  case  conditions.  It  is  to  be  expected  that  if  these 
worst  case  conditions  are  replaced  by  some  statistical 
average,  the  derived  constraints  may  be  appreciably 
relaxed.  Furthermore,  in  many  applications  one  does 
not  need  all  the  possible  interconnections,  and  then 
the  system  may  be  divided  into  subaperturea  In  that 
case  our  estimated  constraints  relate  to  the  largest 
subaperture  and  the  complete  system  may  become 
much  larger. 

Two  new  architectures  are  introduced  in  Sec.  VIII 
with  an  attempt  to  reduce  the  constraints  derived  for 
the  original  architecture.  In  Sec.  IX  we  perform  a 
general  analysis  for  the  derivation  of  the  laser  power 
requirements,  and  in  Sec.  X  a  detailed  car*  study  is 
given  with  the  derivation  of  design  parameters  for  an 
actual  system  that  may  be  implemented  in  practice 
with  presently  available  devices.  This  system  consists 
of  256  X  256  input  and  output  arrays  with  2564  weight¬ 
ed  interconnections  capable  of  performing  1011  inter¬ 
connections/s.  Important  concluding  remarks,  relat¬ 
ed  to  the  actual  implementation  of  an  interconnection 
network,  are  given  in  Sec.  XI. 


15  January  1989  /  Vol.  28.  No.  2/  APPLIED  OPTICS  311 


II.  Ideal  Performance  of  Basic  Architecture 

The  basic  configuration,  shown  in  Fig.  1,  is  the  trans¬ 
missive  version  of  the  reflective  system  described  in 
Ref. 7.  Itconsistsofahologramarrayoflineardimen- 
sions  H  containing  Nk  x  iVh  holographic  optical  ele¬ 
ments,  an  SLM  of  size  5  with  l Vs  X  Ns  pixels  sand¬ 
wiched  between  two  lenses  with  respective  focal 
lengths  /i  and  fi,  and  a  detector  array  D  with  fVd  x  jVd 
detector  elements.  The  i/th  hologram  in  the  array  is 
imaged  by  the  double-lens  configuration  onto  the  i;'th 
element  of  the  detector  array.  This  hologram  diffracts 
light  from  a  reconstruction  beam  with  an  efficiency  ty*/ 
toward  the  kith  pixel  in  the  SLM.  The  same  pixel 
receives  a  weighted  fraction  of  the  light  diffracted  also 
from  all  other  holograms,  but,  assuming  a  linear  inter¬ 
action  in  the  SLM,  these  are  separated  again  on  arrival 
at  the  detector  array.  Thus,  ideally,  each  detector 
receives  the  sum  of  all  the  weighted  beams  just  from  a 
single  hologram  element.  Mathematically,  if  the  pow¬ 
er  transmittance  of  the  kith  pixel  in  the  SLM  is  au,  the 
total  power  received  by  the  i/th  detector  will  be 

I*!/  *  ^ 

it 

where,  for  the  time  being,  coherence  effects  have  been 
ignored. 

Performance  degradation  due  to  coherence  is  just 
one  of  the  factors  that  is  discussed  later  along  with 
some  other  effects  that  limit  the  scale-up  capability  of 
this  and  many  other  architectures.  This  system  in  its 
ideal  form  may  be  viewed  either  as  a  matrix-matrix 
multiplier  of  a  4-D  matrix  by  a  2-D  matrix  or  as  a 
vector-matrix  multiplier  with  vectors  of  Ns  X  N 3  di¬ 
mensions, 

!r!A-B.  (2) 

The  elements  of  the  input  vector  (or  matrix)  are 
introduced  by  the  transmittance  of  the  SLM  pixels 
with  the  hologram  providing  the  fixed  matrix  SlTT 
The  output  vector  is  read  out  from  the  detector  array. 

Alternatively,  we  may  consider  this  an  interconnec¬ 
tion  network  with  channels  that  are  interconnected 
by  iV;  x  N£  weighted  interconnections  that  are  hard¬ 
wired  for  a  given  hologram  array. 

III.  Hologram  Recording 

To  implement  the  above  architecture  one  must  also 
devise  a  system  for  recording  the  required  large  holo¬ 
gram  array.  Within  the  present  state  of  art  the  practi¬ 
cality  of  computer  generation  with  electron -beam 
writing  appears  to  be  out  of  the  question  for  these  large 
arrays.  Thus  one  must  resort  to  optical  recording, 
preferably  with  computer  assistance.1011  Several  pro¬ 
cedures  may  be  envisioned  for  the  implementation  of 
the  hologram  recording  process.  The  most  obvious  of 
these  processes  is  based  on  the  same  optical  system  as 
the  interconnection  network  itself  (Fig.  1)  where  each 
element  of  the  detector  array  is  replaced  one  at  a  time 
by  a  point  source.  A  useful  realization  of  this  point 
source  may  be  the  endface  of  a  single- mode  optical 
fiber  that  can  be  easily  positioned  and  aligned  with  a 
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Fig.  1.  Basic  configuration  for  an -V4  interconnection  network:  H, 

hologram  array  illuminated  by  a  reconstruction  beam  R:  3LM.  spa¬ 
tial  light  modulator  between  two  lenses  L,  and  Li  with  their  respec¬ 
tive  focal  lengths  fx  and  /;;  D,  detector  array  or  an  array  of  nonlinear 
optical  devices. 


computerized  robotic  arm.  To  record  the  i/th  holo¬ 
gram  this  source  is  positioned  at  the  location  of  the  i/th 
detector  pixel,  oriented  for  optimal  illumination  of  the 
SLM,  and  imaged  onto  the  hologram  by  the  two  lenses. 
The  SLM,  sandwiched  between  the  lenses,  writes  the 
desired  interconnection  pattern.  This  special  lens 
configuration  is  useful  to  keep  all  incident  ray  angles 
on  the  SLM  constant  for  a  given  hologram  allowing  for 
adjustments  to  take  care  of  the  angular  variation  of  the 
SLM  transmission  characteristics.  The  constraints 
related  to  the  operation  of  SLMs  are  discussed  in  more 
detail  later.  To  attain  small  repeatable  high  quality- 
holograms,  a  random  phase  plate  over  the  SLM  may  be 
useful24  as  discussed  further  in  Sec.  VI.  The  overall 
process  of  recording  and  reconstruction  is  mathemati¬ 
cally  evaluated  in  Appendix  A  within  the  paraxial  ap¬ 
proximation  for  an  ideal  case  using  operator  nota¬ 
tion.25-27 

In  the  above  recording  configuration  it  was  assumed 
that  an  oblique  reference  beam,  conjugate  to  the  one 
indicated  on  the  figure,  is  incident  on  the  hologram. 
Alternatively,  one  may  use  a  point  source  reference 
situated  on  the  optical  axis  at  the  SLM  plane.  This 
will  allow  an  axial  reconstruction  beam  resulting  in  a 
reduced  bandwidth  requirement  for  the  holograms 
and  a  simpler  reconstruction  configuration.  The  pen¬ 
alty  for  these  benefits  is  removal  of  the  central  portion 
of  the  SLM  and  the  introduction  of  aberrations  in¬ 
duced  by  spherical-wave  recording  and  reconstruction. 

IV.  Semiquantitative  Constraint  Estimation 

The  exact  analysis  of  the  physical  processes  involved 
in  the  operation  of  the  proposed  architecture  is  quite 
complicated  and  outside  the  scope  of  this  work.  Nev¬ 
ertheless  an  appreciable  insight  can  be  obtained  by 
evaluating  the  diffraction  effects  in  the  scalar  ;-\raxial 
approximation.  In  Appendix  A  we  present  a  Fourier 
optics  description  of  the  complete  process  starting 
from  the  hologram  recording  stage  and  ending  at  the 
detection  of  the  output  vector.  Keeping  in  mind  the 
results  *n  the  Appendices,  in  the  present  section  we  use 
a  somewhat  different  approach  that  allows  us  to  take 
into  account  in  a  semiquantitative  wav  effects  induced 
by  off-axis  propagation. 

To  obtain  an  estimate  on  the  limitations  imposed  on 
the  system  of  Fig.  1  we  consider  first  the  diffraction 
effects  occurring  while  light  is  propagated  from  each 
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hologram  in  the  array  toward  the  various  pixels  of  the 
SLM.  For  proper  performance  we  must  require  that 
most  of  the  light  addressing  a  pixel  in  the  SLM  should 
be  incident  on  this  pixel,  and  only  a  negligible  fraction 
will  be  spilled  over  to  other  pixels. 

We  denote  the  linear  dimensions  of  the  respective 
pixel  sizes  in  the  hologram  array,  the  SLM  and  the 
detector  array  by  Ph,  p„  and  Pd,  and  the  respective 
center-to-center  distances  of  the  pixels  by  d*,  ds,  and 
dd.  As  discussed  later,  the  center-to-center  distances 
are  not  necessarily  equal  to  pixel  sizes.  With  the  pa¬ 
rameters  defined  earlier  we  have  the  relations 

•V,  =■  S/d,-,  .V„  *  H/d„-  Sd  =  Did,.  (3) 

If  we  assume  diffraction-limited  performance  with  cir¬ 
cular  hologram  elements,  we  may  require  that  ds  be  not 
smaller  than  the  diameter  of  the  first  Airy  disk  ob¬ 
tained  by  focusing  an  aperture,  the  size  of  the  holo¬ 
gram,  onto  the  SLM  plane.  That  diameter  is 


a  -2.44—.  (4) 

Ph 

where  r  -  //cos 8  is  the  distance  between  the  i/th  holo¬ 
gram  and  the  /flth  SLM  pixel  (see  Fig.  2).  We  also 
must  consider  the  elongation  of  the  spot  due  to  the 
inclination  of  the  beams  incident  on  the  surface  of  the 
SLM.  In  principle  we  should  also  consider  the  cos40 
flux  density  falloff,  but  we  assume  that  this  may  be 
precompensated  for  during  the  hologram  recording 
stage  by  modifying  the  assigned  weights. 

The  optical  configuration  with  the  hologram  in  the 
focal  plane  of  the  lens  ensures  that  ail  beams  from  a 
single  hologram  are  incident  on  the  SLM  approximate¬ 
ly  at  the  angle  at  which  the  central  pixel  is  addressed. 
Thus  from  Fig.  2  it  is  evident  that  the  maximal  value  of 
this  angle  0m„  is  obtained  for  the  hologram  situated  at 
the  corner  of  the  array  and  is  given  by 


All  beam  spots  for  this  marginal  hologram  will  be  elon¬ 
gated  by  a  factor  l/cos0max.  In  calculating  the  distance 
r ,  larger  angles  should  be  also  considered,  but,  to  re¬ 
duce  the  algebraic  complexity,  we  take  into  account 
only  an  average  distance  traveled  by  the  various  beams 
emerging  from  this  hologram  keeping  in  mind  that  the 
actual  situation  is  worse.  For  this  average  distance  we 
may  put  r  =»  / i/cos0mu.  With  all  these  considerations  a 
minimal  requirement  for  pixel  separation  is  given  by 

,  A/, 

d,  £  2.44 - L— •  (6) 

Ph  COS *9^ 

Solving  Eq.  (5)  for  f\  and  substituting  into  Eq.  (6),  we 
obtain 


ll  >  3.45  H 

X  ~  sin2*mu  Ph  '  <8) 

This  relation  may  be  regarded  as  the  constraint  set  by 
the  requirement  of  diffraction-limited  performance. 


Fig.  2.  Definition  of  geometrical  parameters:  H.  hologram  plane: 
S,  SLM  plane  or  the  detector  plane  in  the  modified  architecture:  r, 
distance  between  the  lyth  hologram  and  kith  SLM  pixel.  Polariza¬ 
tion  vectors  P  and  P'  as  well  as  the  angle  and  propagation  vector 
k  are  discussed  later. 

In  its  present  form  one  may  interpret  it  as  a  limitation 
of  the  ratio  H/ph  for  a  given  SLM  ( that  is.  a  given  value 
of  d,)  operating  at  a  given  wavelength.  From  the 
optical  designer’9  point  of  view,  8max  is  determined  by 
the  numerical  aperture  of  the  optical  syste m .  Howev¬ 
er,  as  shown  in  later  sections,  additional  constraints 
should  be  considered.  A3  a  demonstrative  example  to 
the  meaning  of  relation  (8)  we  assume  the  unlikely 
angular  limitation  6mai  =  45°  (corresponding  to  an 
//No.  of  0.7)  and  take  X  *  0.5  pm,  we  need  a  pixel 
center-to-center  distance,  d,  *  173  (im,  to  obtain  a 
ratio  H/ph  <  100.  This  means  that  in  these  conditions 
one  is  limited  to  a  hologram  array  of.Vh  x  .Vh  =  100  x 
100  elements  unless  holograms-are  allowed  to  overlap 
spatially,  or  alternatively  an  appreciable  amount  of 
crosstalk  is  allowed.  The  number  of  elements  in  the 
SLM,  however,  is  not  limited  by  relation  (8),  and  if  we 
want  to  implement  a  system  with  input  vectors  of  rank 
N,X  N,*  1000  x  1000  we  need  an  SLM  with  S  =  17.3 
cm. 

The  limitations  on  the  absolute  size  of  the  holograms 
may  be  determined  by  considering  the  requirements 
for  space-bandwidth  product  or  rather  a  quantity  that 
we  shall  call  information  content  (IC).  If  the  resolu¬ 
tion  of  the  holographic  material  is  1 H  and  it  can  record 
gh  distinct  gray  levels,  we  obtain  the  approximate  value 


If  the  SLM  has  a  gray  level  capability  of  gs  levels,  its  IC 
is 

!C,  m‘^gr  (10) 

which  should  satisfy  the  relation  IC*  >  ICS  leading  to 


It  should  be  noted  here  that,  apart  from  material  limi¬ 
tations,  l  is  also  limited  by  the  recording  wavelength:  a 
holographic  grating  can  never  have  its  interference 
pattern  with  spatial  frequency  higher  than  2/X,  and  in 
most  recording  configurations  one  has  l  >  X.  Continu- 
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ing  with  our  previous  example  with  N,  *  1000,  taking 
the  limiting  value  l  ■  0.5  nm,  and  assuming  gh  *  g„  we 
obtain  Ph  2;  500  nm.  Usually  we  shall  need  at  least 
twice  this  value  to  incorporate  the  hologram  carrier 
frequency  for  off-axis  reconstruction.  If  a  random 
phase  modulator  is  attached  to  the  SLM,  as  proposed 
in  Ref.  24,  the  IC,  increases,  which  necessitates  an  even 
larger  hologram. 

This  first-order  estimation  leads  to  severe  limita¬ 
tions  on  the  number  of  elements  that  may  be  processed 
in  parallel  using  a  single  optical  system.  A  more  de¬ 
tailed  analysis  is  required  to  derive  trade-offs  applica¬ 
ble  to  specific  system  design.  For  example,  one  may 
consider  partially  overlapping  of  holograms  to  obtain  a 
larger  number  of  holograms  while  still  satisfying  the 
restriction  on  the  ratio  between  hologram  size  and 
array  size  [Eq.  (8)].  One  may  also  relax  the  crosstalk 
limitations  that  will  influence  relation  (6).  On  the 
other  hand,  we  must  consider  the  deteriorating  contri¬ 
bution  of  coherence  effects,  aberrations,  scattered 
likgth,  and  performance  limitations  of  SLMs. 

V.  Crosstalk  Considerations 

In  the  configuration  of  Fig.  1,  crosstalk  occurs  on  the 
SLM  plane  and  also  on  the  detector  plane.  The  light 
from  a  single  hologram  is  split  into  several  beams  of 
various  intensity,  each  of  which  is  ideally  focused  into  a 
separate  pixel  of  the  SLM.  These  beams  are  modulat¬ 
ed  by  the  SLM  and  then  converge  to  a  single-detector 
element.  The  crosstalk  on  the  SLM  plane  originates 
mainly  from  light  injected  into  the  kith  pixel  from 
matrix  elements  addressed  to  different  pixels  with 

k'l'  ?£  kl.  As  long  as  the  interaction  in  the  SLM  is 
linear,  the  mixing  of  light  from  several  holograms  has 
no  effect  at  this  plane  but  becomes  important  on  the 
detector  plane  where  light  originating  from  one  holo¬ 
gram  leaks  through  to  unintended  detector  elements. 

To  consider  crosstalk  over  the  SLM  plane,  we  denote 
by  t,  the  power  originating  from  a  certain  hologram 
that  may  be  incident  on  a  single  pixel  due  to  leakage 
from  other  beams  that  are  not  supposed  to  contribute 
to  this  pixel.  One  may  state  that  the  weight  attributed 
to  this  element  for  a  certain  interconnection  is  in¬ 
creased  by  this  value.  However,  since  «,  usually  has  a 
statistical  nature,  we  may  consider  it  as  the  error  as¬ 
signed  to  the  element  of  the  interconnection  matrix. 
Thus,  instead  of  having  a  well-determined  weight  mul¬ 
tiplying  each  SLM  element,  we  must  include  some 
average  bias  level  tJ2  and  write 

+  <«> 

T‘  3  error  has  several  contributions  that  include  dif¬ 
fraction,  aberrations,  inclination  factors,  scattered 
light,  and  coherence  effects. 

A  similar  effect  may  be  observed  on  the  detector 
plane  where  ideally  the  light  emerging  from  each  holo¬ 
gram  should  be  focused  into  a  single  pixel.  Denoting 
the  contribution  of  light  power  from  other  holograms 
by  «d  we  obtain  the  value  of  the  vector  elements  with 
error, 


^-x 


'+rui) 


<«„), 


1 1 3 ) 


where  we  noted  the  dependence  of  the  error  terms  on 
location  and  took  into  account  the  bias  terms  due  to 
the  noise.  Naturally  these  terms  also  depend  on  the 
matrix  and  vector  elements  themselves.  Thus  the 
above  relation  is  essentially  nonlinear  in  the  matrix 
and  vector  components.  Thus  one  may  only  estimate 
some  maximal  error  values  and  possibly  determine 
their  statistical  nature  for  a  given  situation. 

According  to  our  design  objectives,  the  dimensional¬ 
ity  of  the  detector  plane  should  be  equal  to  the  dimen¬ 
sionality  of  the  hologram  plane  (that  is,  Nd  =  aT 
though  not  necessarily  equal  to  .V,).  Since  our  optical 
system  is  essentially  an  imaging  system  between  the 
detector  and  hologram  planes,  we  have  the  geometrical 
relations 


D 


'  15) 


In  principle,  we  also  have  the  relation 


where  ph  is  the  image  of  ph  over  the  detector  plane. 
However,  because  the  reconstructed  wavefront  over  ph 
is  the  phase  conjugate  of  the  writing  beam,  the  distri¬ 
bution  within  ph  will  be  quite  nonuniform.  As  a  mat¬ 
ter  of  fact,  if  Ph  is  very  large  and  the  SLM  has  unit 
transmittance,  the  complete  reconstructed  wavefront 
will  be  concentrated  into  the  region  occupied  by  the 
source  during  hologram  writing.  As  is  evident  from 
the  paraxial  calculator  Appendix  A,  if  we  have  the 
SLM  in  place,  accordir..  £q .  ( A25 ) ,  the  power  distri  - 
bution  will  approximately  be  (ignoring  coherence  ef¬ 
fects  to  be  discussed  in  the  next  section)  that  of  a  sinc: 
function,  the  extent  of  which  is  determined  by  the 
SLM  pixel  size.  If  the  hologram  has  a  finite  size,  this 
distribution  will  be  widened  by  a  convolution  contain¬ 
ing  the  window  function  as  derived  in  Eq.  (A27).  If 
this  window  function  is  not  too  small,  that  is,  it  satisfi 
a  relation  of  the  form  (8),  we  may  state  that  the  cross¬ 
talk  over  the  detector  plane  is  generally  proportional 
to  the  diffraction  spot  size  of  the  SLM  pixel  over  this 
plane.  Using  considerations  similar  to  those  leading 
to  Eq.  (6),  we  may  write  for  the  power  that  spills  over 
the  area  of  the  detector  pixel 


u-(  -  —  Y- 

\titP,  nn26mtx) 

where  is  the  maximal  angle  in  the  detector  plane. 
If  we  define  this  angle  in  a  similar  way  to  the  definition 
of  Amu.  we  have  here  too  a  geometrical  relation 


US) 


The  above  discussion  indicates  that  the  crosstalk 
term  at  the  detector  plane  and,  according  to  the  discus- 
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sion  in  the  previous  section,  also  the  crosstalk  term  on 
the  SLM  plane  are  both  inversely  proportional  to  the 
SLM  pixel  size.  Thus,  to  minimize  crosstalk  due  to 
diffraction  one  would  like  to  increase  p,  as  much  as 
practically  feasible.  However,  if  we  intend  to  limit  the 
SLM  pixel  size  to  its  minimal  value  according  to  Eq. 
(6),  we  may  substitute  the  equality  relation  of  Eq.  (8) 
into  Eq.  (17)  to  obtain 

/  P>\dJ i  sin20mu  y 

(d  «  I -  ■  (19) 

where  we  used  Eq.  (15).  For  the  crosstalk  over  the 
SLM  we  may  write  a  similar  expression: 

«  j _ _ y.  ,20) 

■  \dj>H  •in28m,I ) 

Multiplying  the  last  two  equations  we  may  write 

«,«„  «  ( - - - Xtf  Y  •  (21) 

\<W,  Sin2flmu  ) 

For  relatively  small  angles  we  may  also  write  with  the 
help  of  Eq.  (18) 

— - - VfA  •  (22) 

\f\ddP,  sin2#  / 

This  relation  is  quite  interesting  as  it  indicates  some 
possibilities  for  improving  system  performance  by  in¬ 
creasing  the  SLM  and  detector  pixel  sizes  and  the  ratio 
/1//2.  Unfortunately,  these  parameters  cannot  be  ad¬ 
justed  independently,  and  they  must  be  considered 
together  in  some  optimization  process. 

To  evaluate  the  order  of  magnitude  of  the  crosstalk 
errors  we  follow  the  analytical  results  of  Appendix  A 
with  some  numerical  calculations  assuming  rectangu¬ 
lar  pixels  as  shown  in  Fig.  3.  To  evaluate  the  crosstalk 
over  the  detector  plane  we  may  start  from  Eq.  (A25) 
and  keep  in  mind  that  a  similar  procedure  applies  also 
for  the  SLM  plane.  We  may  normalize  the  argument 
of  the  sine  function  to  the  value  of  its  argument  at  its 
first  zero: 


where  a  detector  pixel  size 


o  V 
Pd  ■  2xo  “  - 
P , 


covers  the  whole  central  lobe  of  the  sine  function.  It 
should  be  noted  that  this  size  is  smaller  than  the  one 
used  in  the  previous  estimations. 

If  we  illuminate  the  detector  array  with  such  a  sine 
function  centered  on  a  detector  denoted  by  A  in  Fig.  3, 
the  detected  intensity  will  be  the  integrated  square  of 
the  sine  function  over  the  area  of  each  detector  ele¬ 
ment.  The  curves  in  Fig.  4  are  the  calculated  integrals 
over  pixels  situated  relative  to  A  as  the  ones  denoted  by 
B.  The  values  are  given  as  the  percentage  relative  to 
the  integral  over  pixel  A  as  a  function  of  the  normalized 
pixel  separation  djpd  for  three  normalized  values  of  pd 
with  the  middle  curve  (pd  m  1)  corresponding  to  the 
value  given  by  Eq.  (24).  As  expected,  the  crosstalk 
increases  drastically  if  the  interpixel  distance  drops 
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Fig.  3.  Layout  of  detector  array. 
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Fig.  4.  Crosstalk  percentage  as  a  function  of  relative  center-to- 
center  distance  of  detectors.  Parameter  is  detector  size  relative  to 
central  lobe  o  sine  function. 


below  the  size  of  the  central  lobe,  but  there  is  an 
appreciable  crosstalk  even  for  quite  large  separation. 
The  curves  indicate  that  we  do  not  gain  much  by 
enlarging  the  detector  elements  much  above  2xo-  As¬ 
suming  that  this  i9  a  good  choice  for  the  dimension  of 
the  active  part  of  the  detector  (p<,  *  2x0),  we  obtain  a 
calculated  value  for  the  crosstalk  between  two  adja¬ 
cent  detectors  of  3.5%  if  they  touch  each  other  (pd  = 
dd).  This  value  for  dd  is  technically  not  feasible,  and 
we  rather  take  dd  ■  4xo  with  which  we  obtain  a  cross¬ 
talk  of  0.74%.  The  crosstalk  to  a  more  distant  pixel  (D 
in  Fig.  3)  with  its  center  at  8x0  is  ~0.18%,  while  the 
value  for  the  nearest  diagonally  positioned  pixel  (C  in 
Fig.  3)  is  only  0.075%.  Assuming  this  arrangement  we 
observe  a  maximum  of  four  pixels  each  contributing 
0.74%,  four  pixels  contributing  0.075%,  and  four  more 
contributing  ~0.18%,  each.  Taking  into  account  the 
smaller  contributions  from  more  distant  pixels,  a  max¬ 
imum  estimated  crosstalk  value  is  ~4%. 
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As  we  have  seen,  due  to  the  special  structure  of  the 
diffraction  pattern  from  a  rectangular  aperture,  the 
crosstalk  between  diagonally  positioned  pixels  is  about 
an  order  of  magnitude  smaller  than  between  adjacent 
ones.  Consequently,  a  checkerboard  configuration 
may  be  advantageous  for  some  applications  even  if  this 
requires  a  fourfold  reduction  in  the  number  of  pixels. 
Naturally,  even  in  this  case  we  cannot  forget  the  sec¬ 
ond  neighbors  D  that  still  contribute  0.18%,  each  lead¬ 
ing  to  ~1%  combined  contribution  from  the  eight  near¬ 
est  pixels. 

The  above  last  number  for  the  fractional  error  ap¬ 
pears  quite  affordable,  but  we  may  run  into  difficulties 
even  with  this  quantity.  In  general,  we  may  define  a 
quantity  tf  that  gives  the  fractional  leakage  from  one 
pixel  addressed  by  a  single  channel  into  a  neighboring 
one.  If  the  maximum  possible  power  arriving  to  the 
detector  from  a  single  channel  is  7,  the  crosstalk  to  a 
neighboring  pixel  will  be  Itf.  If  we  have  M  such  pixels 
around  each  other  pixel  (M  =  8  in  the  above  example), 
we  obtain  a  maximum  crosstalk  from  these  pixels  of 
MItf.  Assuming  that  this  contribution  is  the  most 
significant  and  we  may  neglect  other  contributions,  we 
shall  obtain  the  maximal  value  of  the  crosstalk  when 
all  iVf  channels  are  addressed  with  full  weight  to  all  the 
neighboring  pixels: 

(25) 

which  may  become  much  larger  than  7  even  for  small  tf. 
We  shall  return  to  this  subject  and  discuss  it  further  in 
relation  to  the  estimation  of  laser  power  requirements. 

All  the  above  considerations  presume  ideal  perfor¬ 
mance  and  alignment.  One  essential  technical  factor 
to  treat  is  proper  alignment.  If  we  have  a  misaligned 
pixel  it  will  be  shifted  on  the  curves  of  Fig.  4.  For 
example,  according  to  the  calculations  with  the  above 
configuration,  the  measured  power  for  a  pixel  dis¬ 
placed  in  one  direction  by  5%  will  be  off  by  1%. 

VI.  Coherence  Effects 

In  Eq.  (I)  we  assumed  that  the  power  contributed  by 
the  different  pixels  in  the  SLM  is  combined  incoher¬ 
ently  at  a  detector  pixel.  As  is  quite  evident,  this 
assumption  is  incorrect  since  the  holograms  are  illumi¬ 
nated  by  coherent  light  and  one  must  consider  coher¬ 
ent  superposition.  To  do  this  we  have  to  e  . 1  u  oe  the 
complete  complex  amplitude  distribution  a>.  *  he  i  tec- 
tor  plane.  As  a  first-order  approximation  fc  j  co¬ 
herence  effects  we  ignore  the  crosstalk  and  start  from 
an  ideal  infinite  hologram,  recorded  by  a  point  source 
which  results  in  the  expression  given  by  Eq.  (A25). 
That  relation  gives,  apart  from  a  quadratic  phase  fac¬ 
tor  common  to  the  whole  detector  plane,  the  complex 
amplitude  distribution  over  a  single  detector  pixel  due 
to  a  single  hologram.  This  relation  is  reproduced  here 
for  convenience: 

L"  -  £  g„nhljmn3(-d^n/f)  iinc(xp,/\f,yp,/\f).  (26) 

mn 

The  various  parameters  are  explained  in  the  Appendi¬ 
ces  (we  changed  the  dummy  indices  to  avoid  confusion 


with  the  wavenumber  k  =  2 t/A),  and  here  we  just  note 
that  the  g  and  h  factors  are  the  amplitude  transmit- 
tances  of  the  SLM  pixels  during  the  reconstruction 
and  hologram  writing  processes,  respectively.  F or  our 
discussion  here  the  important  factor  in  this  expression 
is  the  linear  phase  factor 


3[-d‘mJf\  *  exp 
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The  power  detected  by  the  detector  element  is  given 
by 


lu‘[!  “22  «. 
■na  m  n 
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The  terms  in  the  double  summation  are  the  same  for 
both  summations.  Therefore,  the  mixed  terms  occur 
twice  w  ith  the  sign  of  the  linear  phase  inverted.  Thus 
we  may  write 
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This  relation  gives  the  power  distribution  over  the 
detector  plane  for  reconstruction  with  a  single  holo¬ 
gram  having  infinite  size.  To  find  the  actual  detector 
signal  we  must  integrate  over  its  sensitive  area  p~. 
Then  the  first  sum  will  correspond  to  Eq.  (1)  if  we  put 

h;,ki  *  ti;w.  '  30) 

and  we  are  left  with  tne  disturbing  interference  terms 
of  the  second  summation.  Due  to  the  integration,  the 
contribution  of  these  interference  terms  is  quite  small 
except  for  the  smallest  phase  factors  contributed  by 
nearly  neighboring  pixels.  The  worst  case  is  a  nearest 
neighbor  such  asm  =  m’,n  *  n'  -  1.  For  this  case  we 
shall  have  to  integrate  over  an  expression  of  the  form 


sine -(xp,/\f.yp,/\f) 


1 31 ) 


where  a  and  8  are  two  constants  determined  by  the 
input  vector  and  interconnection  strengths.  Since  in 
the  derivation  of  Eq.  (26)  we  have  already  extracted 
the  shift  operator  from  the  amplitude  distribution  [Eq. 
(A22)],  we  assume  here  that  the  detector  pixel  is  cen¬ 
tered  at  (x  ■  0,y  ■  0),  where,  according  to  the  above 
relation,  the  intensity  is  much  higher  than  for  the 
incoherent  superposition.  The  amount  of  this  dis¬ 
crepancy,  after  integration,  depends  on  detector  pixel 
size  Pd.  If  we  take  this  size  to  cover  the  region  up  to  the 
first  zero  of  the  sine  function  as  in  Eq.  (26),  we  obtain 
the  first  zero  of  the  interference  at 
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which  satisfies  the  relation  x,  <  pdl 4,  and  we  have  at 
least  one  full  oscillation  period  over  the  integration 
length. 

The  actual  net  effect  will  be  a  redistribution  of  the 
power  over  the  detector  area  without  much  effect  on 
the  total  power.  As  a  matter  of  fact,  for  this  special 
choice  of  parameters  a  larger  fraction  of  the  power  is 
concentrated  around  the  detector  center,  which  tends 
to  reduce  crosstalk.  If  we  calculate  the  integrated 
power  over  the  detector  for  this  case  we  obtain  25% 
more  power  concentrated  on  the  detector  surface. 
However,  considering  the  fact  that  the  relation  ds  >  ps 
is  always  satisfied,  the  coherent  contribution  is  much 
less  pronounced.  Taking  the  parameters  of  the  previ¬ 
ous  section,  d,  =  2 ps,  the  difference  between  coherent 
superposition  and  noncoherent  superposition  is  only 
0.2%.  More  distant  pixels  will  contribute  interference 
terms  of  higher  spatial  frequency,  which  tends  to  re¬ 
duce  the  uncertainty  to  a  negligible  amount.  The 
uncertainty  in  the  detected  value  is  also  increased  by 
the  fact  that  no  prior  knowledge  is  possible  about  the 
relative  magnitudes  of  the  parameters  a  and  0. 

An  additional  error  source  due  to  coherent  illumina¬ 
tion  comes  from  contamination  and  irregularities  in 
the  actual  system.  For  example,  a  dust  particle  of 
cross-sectional  area  having  a  fraction  v  of  a  pixel  area 
may  scatter  light  of  that  fraction  very  unevenly.  Thus 
a  fraction  n  of  power  may  be  removed  from  one  pixel 
and  injected  into  another.  A  10-jtm  particle,  common 
to  most  laboratory  environments,  has  an  area  of  the 
order  of  0.3%  of  the  SLM  pixel  area  we  derived  in  Sec. 
IV.  If  we  consider  this  fractional  noise  as  a  coherent 
amplitude  noise,  it  may  amount  to  ~0.6%  of  local  pow¬ 
er  uncertainty. 

To  complete  this  picture  one  must  also  consider  the 
fact  that  most  contributions  to  the  crosstalk  error  are 
coherent  with  the  signals,  and  for  accurate  analysis  Eq. 
(13)  should  be  modified  accordingly. 

For  the  sake  of  brevity  we  considered  in  detail  only 
the  coherence  effects  at  the  detector  plane.  We  should 
keep  in  mind,  however,  that  similar  effects  occur  also 
during  the  hologram  recording  process  and  at  the  SLM 
during  reconstruction.  Although  we  assume  linear 
interaction  with  the  SLM,  a  redistribution  of  intensi¬ 
ties  due  to  coherent  superposition  may  contribute  to 
an  increased  uncertainty  in  the  interconnection 
weights.  As  on  the  detector  plane,  where  we  investi¬ 
gated  the  power  redistribution  due  to  the  coherent 
superposition  of  the  contribution  from  different  SLM 
pixels,  on  the  SLM  plane  we  have  the  same  effect  from 
the  coherent  superposition  of  the  contributions  from 
different  holograms.  However,  as  mentioned  earlier, 
assuming  linear  interaction  at  the  SLM  this  coherent 
superposition  has  no  appreciable  contribution  to  the 
error. 

Some  deteriorating  coherence  effects  may  be  re¬ 
duced  by  adding  a  random  phase  mask24  to  the  SLM  as 
noted.  If  the  spatial  frequency  of  this  phase  mask  is 
higher  than  l/p„  this  will  increase  the  IC  requirements 
of  the  hologram,  and  information  will  be  lost  unless  the 
hologtam  size  is  increased.  If,  however,  the  phase 


mask  has  a  constant  phase  over  each  pixel  this  may 
spread  the  information  more  evenly  without  contrib¬ 
uting  much  to  bandwidth  requirements.  A  phase 
modulation  of  this  kind  may  also  reduce  the  uncertain¬ 
ties  induced  by  coherent  superposition  over  the  vari¬ 
ous  planes,  although  some  precautions  should  be  exer¬ 
cised  to  avoid  unnecessary  distortions. 


VII.  Polarization  Effects  and  SLM  Performance 

Most  SLMs  that  are  available  today  operate  on  the 
polarization  of  light,  and  they  are  designed  for  near 
normal  incidence.  If  an  SLM  is  to  be  employed  with 
light  beams  having  variable  angles  of  incidence,  we  are 
faced  with  two  major  effects.  The  first  effect  concerns 
the  angular  dependence  of  the  SLM  performance  it¬ 
self,  and  the  second  effect  is  related  to  the  polarization 
characteristics  of  nonpianar  wavefronts.-8  According 
to  some  recent  investigation,  reflective  magnetooptic 
SLMs  may  be  designed  with  reduced  angular  depen¬ 
dence.29  However,  most  SLMs  available  today  rely  on 
the  transmission  of  light  through  a  controlled  birefrin- 
gent  medium,  such  as  a  liquid  crystal.  This  birefrin- 
gent  medium  has  a  thickness  d  within  which  the  opti¬ 
cal  path  difference  for  the  two  polarization 
components  1(6)  changes  with  angle.  If  we  assume  the 
birefringent  layer  to  function  as  a  halfwave  plate,  we 
may  write  the  relation 


K0)  -  (n,  -  n0)d  -  (2.V  +  DA/2,  1 33) 

where  ((0)  is  the  optical  path  difference  for  normal 
incidence,  N  is  an  integer  giving  the  order  of  the  wave- 
plate,  and  n0,n,  are  the  ordinary  and  extraordinary 
indices,  respectively.  If  a  light  beam  is  incident  on  the 
face  of  the  SLM  at  an  angle  of  incidence  6,  then,  taking 
into  account  Snell’s  law,  the  optical  path  difference 
changes  to 
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(2.V  +  l)A 


« 34) 


where  we  ignored  the  splitting  of  the  two  polarization 
components  and  took  n  as  some  average  of  ne  and  n0. 
Assuming  a  relatively  small  angle,  we  may  write  the 
approximate  relation 


+  135) 

this  relation  means  that  we  have  introduced  a  phase 
error  of  the  order  of 


0  -  (2N  +  l)ir  •  136) 

2n2 

This  phase  error  approximately  determines  the  value 
of  the  field  component  emerging  from  the  SLM  medi¬ 
um  at  the  wrong  polarization  that  will  contribute  a 
fractional  error  in  the  power  transmitted  by  the  ana¬ 
lyzer  of  the  order  of 


sin2 [" (21V  +  D»  — rrl  •  ,37) 

L  2n*  _ 

Polarization  errors  of  this  kind  may  become  quite  large 
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for  high-order  waveplates,  but  it  is  rather  small  for 
first-order  plates  (N  =  0)  provided  the  angle  is  not 
large. 

Special  SLM  design  for  broader  acceptance  angle 
may  be  helpful  in  reducing  this  effect.  However,  even 
with  a  reduced  angular  sensitivity  of  the  SLM  itself, 
polarizers  must  be  employed  in  conjunction  with  them, 
av  least  as  long  as  they  operate  on  the  polarization  state 
of  light.  Thus  we  should  consider  the  deterioration  of 
the  performance  of  polarizers  with  a  varying  angle  of 
incidence.28  We  evaluate  this  degradation  of  perfor¬ 
mance  by  first  defining  a  polarization  unit  vector  P 
that  describes  the  polarization  orientation  of  light 
transmitted  by  a  polarizer  for  a  normally  incident  light 
beam.  Denoting  the  propagation  direction  of  any  inci¬ 
dent  beam  by  a  unit  vector  k,  the  transmitted  beam 
will  have  a  polarization  orientation  along  the  unit  vec¬ 
tor,28 

.  _  P-M-P) 


If  we  place  now  an  analyzer  with  its  polarization  vector 
P',  following  the  polarizer,  the  transmitted  field  com¬ 
ponent  et  will  be  the  projection  of  £  on  a  similar  vector, 
S'  determined  by  P'  also  according  to  Eq.  (38).  With 
an  incident  beam  of  unit  amplitude,  the  transmitted 
amplitude  component  will  be 

.  ..  P-M-P)  P-M-fr 
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Evaluating  this  scalar  product  for  crossed  polarizers 
(P  _L  P')  we  obtain  the  fraction  of  field  amplitude  that 
leaks  through: 

,  ,  (P-b)iP-b) 

\e,\  -  -  —  — - —  •  (40) 

vl  -  (P-£)\l  -  (P’^k)1 

With  our  definition  for  the  maximum  inclination  angle 
[Eq.  (5)],  one  may  show  that  the  maximum  value  of  the 
scalar  products  in  this  equation  is  given  by  (see  Fig.  2) 

P ■  b  m  P  ■  b  *  sina  *  — •  (41) 

v2 

Considering  the  amplitude  transmission  of  two 
crossed  polarizers  as  given  in  Eq.  (40),  we  may  inter¬ 
pret  the  square  of  this  expression  a9  the  angle  depen¬ 
dent  extinction  ratio  that  attains  its  worst  value  for  the 
maximal  angle  of  incidence  [Eq.  (5)[: 

I*,IL.  *  •  |42) 

where  the  approximation  applies  for  small  angles  that 
do  not  always  apply.  This  error  is  comparable  to  first- 
order  (N  ■  0)  SLM  power  polarization  error  [Eq.  (37)] 
that  for  relatively  small  angles  may  be  approximated 
by 


If,  for  example,  we  take  the  large  value  0mtx  3  45°  as  for 
the  estimations  in  Sec.  IV,  we  obtain  an  extinction 
ratio  of  the  order  of  1/9,  even  with  an  ideal  SLM 
combined  with  ideal  polarizers. 


Fig.  5.  Modified  architecture  with  SLM  S  adjacent  to  the  hologram 
array  H.  P  is  an  optional  high  efficiency  grating  that  may  be 
employed  for  tilting  the  reference  beam. 


Similar  polarization  effects  play  a  role  abo  during 
the  recording  of  a  hologram.  While  the  reference 
beam  has  a  well-defined  polarization  the  object  beam 
is  usually  not  a  uniform  plane  wave.  If  we  adjust  the 
polarization  of  the  reference  beam  to  fit  the  polariza¬ 
tion  of  part  of  the  object  wavefront,  all  beam  compo¬ 
nents  incident  on  the  hologram  at  different  angles  will 
have  a  polarization  error  effecting  the  reconstruction 
beam  diffraction  efficiency  into  that  direction. 

VIII.  Modified  Architectures 

The  most  severe  limitation  in  the  original  architec¬ 
ture  is  due  to  diffraction  crosstalk  and  the  angular 
constraints  of  the  SLM.  In  a  modified  system,  sug¬ 
gested  also  in  Refs.  30  and  31,  the  SLM  is  attached  to 
the  hologram  array,  and  it  is  illuminated  by  a  uniform 
reference  beam  (Fig.  5).  Thus  all  the  angular  varia¬ 
tions  of  light  incident  on  the  SLM  are  eliminated. 
Furthermore,  there  is  no  longer  a  need  for  large  lenses 
in  the  operating  sy;rsm,  although  they  may  be  needed 
for  the  hologram  rding  stage. 

The  system  is  operated  here  too  by  introducing  the 
input  vector  a*/  in  the  SLM.  Each  element  of  the 
hologram  array  is  illuminated  by  the  reference  beam 
through  a  corresponding  pixel  in  the  SLM.  thus  with  a 
reference  beam -intensity  proportional  to  a«.  The 
fc/th  hologram  diffracts  light  toward  the  t;th  detector 
in  the  array  with  an  efficiency  tku,\  this  detector  re¬ 
ceives  from  the  kith  hologram  light  with  power  propor¬ 
tional  to  tkii/Oki-  The  overall  power  detected  by  this 
detector  element  is  the  sum  of  all  the  contribu  ions 
(again  ignoring  coherence  effects): 

'44) 

Ml 

This  equation  is  of  the  same  form  as  Eq.  (1).  which 
contains  rather  than  tklij  and  will  be  identical  to  it 
if  the  new  I TI!  matrix  is  the  transpose  of  the  old  one. 

One  possible  procedure  to  record  the  hologram  array 
is  similar  to  the  original  architecture  and  uses  the  same 
system.  This  reinstalls  some  of  the  problems  dis¬ 
cussed  earlier  but  which  can  be  dealt  with  more  effi¬ 
ciently  since  for  every  hologram  recording,  a  single 
source  is  used. 
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The  main  initiative  for  this  architecture  was  the 
reduction  of  the  angular  constraints  on  the  SLM,  and 
we  have  already  seen  that  it  also  eliminates  the  need 
for  the  lenses.  It  turns  out  that  there  are  also  addi¬ 
tional  advantages:  The  polarization  effects  were  also 
mainly  related  to  the  function  of  the  SLM,  and,  there¬ 
fore,  they  are  absent  here.  Furthermore,  there  is  no 
crosstalk  on  the  SLM  plane  since  it  is  practically  in 
contact  with  the  hologram.  Thus  in  this  architecture 
the  only  crosstalk  (with  an  ideal  SLM  having  infinite 
extinction  ratio)  is  on  the  detector  plane  and  is  deter¬ 
mined  by  the  diffraction  spot  size  of  the  hologram  and 
will  satisfy  the  relation 


and  also  by  coherence  effects  and  aberrations. 

The  pixel  size  of  the  SLM  is  virtually  unlimited  for 
this  architecture  since  each  pixel  serves  only  as  the 
source  of  a  reconstruction  beam  for  a  single  hologram. 
Small  SLMs  available  today  may  be  used  with  a  projec¬ 
tion  optical  system  to  match  its  pixel  size  optically  to 
that  of  the  required  hologram  aperture. 

The  most  obvious  penalty  for  the  benefits  of  this 
architecture  is  the  reduced  flexibility  due  to  the  re¬ 
quirement  that  there  should  be  a  one-to-one  corre¬ 
spondence  between  the  individual  holograms  and 
SLM  pixels.  Thus  overlapping  holograms  are  no  long¬ 
er  allowed,  and  also  the  same  number  of  SLM  pixels  as 
holograms  must  be  used  unless  one  allows  the  illumi¬ 
nation  of  several  holograms  with  a  single  SLM  pixel  or 
vice  versa.  This  state  of  affairs  is  useful  for  many 
applications  where  the  overall  N4  interconnections  are 
not  required. 

Coherent  superposition  at  the  detector  plane  still 
takes  place  as  with  the  original  architecture,  but  here 
the  superposition  is  from  different  holograms  and  not 
from  different  SLM  pixels.  T o  overcome  this  problem 
an  improved  version  of  this  architecture  will  be  even¬ 
tually  possible  with  the  development  of  large  laser 
diode  arrays  that  may  be  able  to  replace  the  SLM. 
The  lasers  in  the  array  will  be  individually  modulated 
to  represent  the  input  vector. 

This  modification  will  lead  to  an  appreciable  reduc¬ 
tion  in  coherent  noise  since  each  hologram  is  illuminat¬ 
ed  by  a  separate  laser  and  each  detector  element  re¬ 
ceives  a  single  contribution  from  each  hologram.  Now 
the  superposed  beams  on  each  detector  element  origi¬ 
nate  from  different  lasers  and  may  be  combined  inco¬ 
herently  (see  also  Appendix  A). 

The  diode  array  configuration  will  be  superior  to  the 
various  SLM  configurations  in  speed,  dynamic  range, 
and  SNR.  The  only  obvious  problem  is  that  such 
arrays  are  not  available  yet;  however,  present  research 
in  this  area  should  provide  the  needed  devices  in  the 
relatively  near  future. 

IX.  Illumination  Power  Requirement* 

To  estimate  the  laser  power  requirements  we  denote 
the  minimum  detectable  power  by  w0.  If  we  allow  n 
interconnection  weight  levels,  we  would  ideally  like  to 
have  a  maximum  power  available  in  each  channel  at 


the  detector  plane  equal  to  n  x  w0.  Each  hologram  in 
the  hologram  array  diffracts  light  into  ,V;  channels, 
and  we  have  IVjj  such  holograms  (and,  of  course,  also 
N2d  ■  N'l  detectors).  Thus  the  maximum  total  dif¬ 
fracted  power  we  need  at  the  detector  plane  will  be 
given  by 

-  S\S]nw0.  '461 

Denoting  by  rj  an  overall  efficiency  of  the  reconstruc¬ 
tion  process  (including  hologram  diffraction  efficien¬ 
cy,  useful  hologram  area,  SLM,  and  detector  cross- 
sections  as  well  as  other  losses  in  the  system),  we  arrive 
at  a  laser  power  of 


This  power  requirement  may  be  very  high  although 
affordable.  It  turns  out,  however,  that  we  really  do  not 
need  such  high  power  levels  since  this  calculation  as¬ 
sumes  a  detection  dynamic  range  of  N^N^n,  which  may 
become  also  too  high  for  any  reasonable  detector. 
Furthermore,  if  we  take  into  account  the  unavoidable 
noise  level  given  by  Eq.  (25),  we  may  conclude  that 
there  is  no  sense  in  requiring  a  detection  level  which  is 
lower  than  this  noise.  Thus,  assuming  that  in  Eq.  ( 25) 
we  always  have  <<*  >  1  for  a  realistic  system,  we  may 
require  a  detection  limit  of  only  <<*  and  decision  levels 
also  not  closer  than  this  value.  Basing  this  realistic 
approach  on  the  considerations  that  lead  to  Eq.  (25)  we 
may  derive  a  new  value  for  the  laser  power:  W e  set  the 
minimum  detectable  power  equal  to  the  maximum 
crosstalk  noise. 


'481 

where  I  is  the  power  received  by  the  detector  from  a 
fully  weighted  channel.  The  maximum  power  re¬ 
ceived  by  a  single  detector  will  be  when  all  channels  are 
addressed  to  it,  that  is,  .V;/.  Therefore,  we  may  as¬ 
sume  a  worst  case  number  of  decision  levels  to  be 


— - >  — -  .  1491 

The  number  of  decision  levels  multiplied  by  the  detec¬ 
tor  sensitivity  w0  gives  the  maximum  required  power 
on  the  detector.  For  Nj  such  detectors  and  taking  into 
account  the  overall  efficiency,  we  obtain  the  total  laser 
power 
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X.  Design  Considerations 

Design  parameters  for  an  actual  system  are  very 
strongly  application  dependent,  and  before  attempt¬ 
ing  any  design  procedure  one  should  answer  several 
questions:  What  is  the  dimension  of  the  vectors  to  be 
processed?  What  is  the  minimum  acceptable  number 
of  interconnection  weight  levels?  What  is  the  mini¬ 
mum  number  of  detection  levels?  What  is  an  accept¬ 
able  error?  Some  of  the  answers  to  these  questions 
may  turn  out  to  be  incompatible  due  to  the  limitations 
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derived  in  this  paper,  and  compromises  should  be 
made.  For  example,  we  have  seen  that  with  an  accep¬ 
tance  angle  of  45°  the  extinction  ratio  of  an  SLM 
system  will  be  worse  than  1:9,  and  crosstalk  may  be  at 
least  a  few  percent.  This  limits  the  number  of  weight 
levels  to  ~10.  A  smaller  more  realistic  angle  may 
provide  a  larger  number  of  weight  levels,  but  then  the 
number  of  matrix  elements  is  greatly  reduced. 

As  a  case  study  we  attempt  to  design  a  neural  net¬ 
work  with  input  and  output  vectors  of  256  X  256  ele¬ 
ments.  Several  neural  network  algorithms  use  a 
thresholded  two-level  output  but  with  interconnection 
weights  requiring  a  dynamic  range  as  large  as  possible. 

Hologram  arrays  of  this  size  have  already  been  im¬ 
plemented,1011  and  what  remains  for  us  here  is  to 
determine  their  limitations.  Taking  in  Eq.  (11)  l  *=  1 
Mm  and  g,  =  gh  we  obtain  Ph  -  256  Mm  as  was  actually 
implemented  (with  the  lower  limit)  in  the  above  refer¬ 
ences.  To  accommodate  also  the  off-axis  carrier  for 
the  hologram  recording  we  need  at  least  another  factor 
of  2  that  leads  to  ph  m  500  Mm. 

If  we  use  rectangular  pixels,  the  various  sizes  will  be 
determined  by  the  allowable  crosstalk  using  calcula¬ 
tions  like  those  for  the  plots  of  Fig.  4.  Taking  an  Ft  no. 
■  1  leads  to  0mal  »  30°.  Considering  the  architecture 
of  Fig.  1  with  an  SLM  layout  similar  to  Fig.  3  we  may 
attempt  to  choose  a  value  d,/p.,  *  2  with  p ,  matching 
the  central  lobe  of  the  hologram  diffraction  pattern. 
Using  the  calculated  results  of  Sec.  VI,  we  obtain  a 
fractional  crosstalk  from  a  single  neighboring  pixel  of 
the  order  of  ef »  0.75%  and  Mtf  *  4%.  The  polarization 
errors  from  relations  (42)  and  (43)  will  be  of  the  order 
of  4.5%  assuming  n  »  1.5.  The  total  uncertainty  in  the 
weights  is  thus  close  to  10%,  and  there  is  not  much 
sense  in  requiring  more  than  ten  weight  levels. 

Returning  now  to  the  analysis  of  Sec.  IV  we  may 
modify  relation  (8)  to 


For  256  pixels  with  nonoverlapping  holograms  we 
need  H/ph  ■  256  leading  to  d,/\  *  836.  With  X  ■  0.5 
Mm  we  end  up  with  d,  ■  418  Mm.  This  value  is  larger 
than  available  in  most  SLMs,  and  we  may  overcome 
this  difficulty  by  partially  overlapping  holograms:  A 
value  of  pn  -  2dt»  bring  us  into  a  practical  domain. 

To  consider  the  crosstalk  over  the  detector  plane  we 
assume  a  detector  layout  similar  to  Fig.  3,  keeping  the 
condition  of  Eq.  (51)  with  d<j  replacing  d,  and  p,  replac¬ 
ing  ph •  In  these  conditions  we  obtain  a  maximum 
estimated  error  given  by  Eq.  (25)  to  be  ~2562  x  4%/  * 
2600/  with  I  denoting  the  full  weighted  single-channel 
interconnection  power.  At  first  sight  this  appears  to 
be  a  formidable  error,  but  it  still  constitutes  just  4%  of 
the  maximum  power  to  be  detected,  and  we  may  use 
twenty- five  decision  levels. 

If  we  convert  to  the  modified  architectures  of  Fig.  5 
we  may  forget  about  the  crosstalk  over  the  SLM  and  its 
angular  dependence,  although  special  precautions 
should  be  taken  during  hologram  recording.  By  Eq. 
(51)  we  can  replace  d,  by  dd  and  note  that  the  detector 
error  remains  essentially  the  same. 


The  power  requirements  can  be  estimated  from  Eq. 
(49).  Taking  the  estimated  =  Me/  ®  4%  and  assum¬ 
ing  an  overall  light  efficiency  v  =  0.1,  we  shall  need  a 
laser  power  of  1.64  X  107  w0.  With  high  sensitivity 
detectors  w0  m  10"9  W  we  need  a  laser  power  of  ~16 
mW,  which  implies  that  laser  power  is  not  a  problem, 
and  one  may  employ  detectors  of  lower  sensitivity 
when  used  in  conjunction  with  higher  power  lasers. 

The  required  detector  is  also  available.  If  we  take 
the  obvious  detector  size  value  dd  =  d,  =  450  Mm 
leading  to  D  ■  450  X  256  =  11.52  cm  we  cannot  use  the 
commercial  CCD  arrays,  but  we  can  use  solar  cells, 
vidicons,  or  image  dissectors  for  detection  and  readout. 

To  conclude  this  section  we  may  safely  state  that 
2564  »  4.3  X  109  interconnections  between  256-’  =  6.55 
X  104  channels  with  at  least  10  weight  levels,  and 
twenty-five  output  levels  are  possible  with  presently 
available  devices.  Assuming  a  TV  rate  of  30  frames/s 
we  may  perform  30  x  2564  =*  1.3  x  10n  interconnec¬ 
tions/s.  This  number  may  be  increased  at  least  by  1 
order  of  magnitude  at  the  expense  of  the  number  of 
weight  levels  (or  just  the  decision  levels  in  the  modified 
architecture).  With  progress  in  the  technology  of 
SLMs  we  may  expect  one  more  order  of  magnitude, 
and  the  limit  is  still  greater  once  large  and  fast  laser 
diode  arrays  are  available. 

Assuming  a  conservative  number  of  3-bit  weight 
levels  leads  to  a  total  of  IC  that  is  stored  in  the  holo¬ 
gram  array  of  3  X  2564  -  1.3  X  1010  bits. 

XI.  Conclusions 

We  have  shown  in  this  work  that  it  is  possible  to 
implement  with  existing  devices  a  holographic  weight¬ 
ed  interconnection  network  with  256  X  256  channels. 
The  various  constraints  have  been  analyzed  in  detail, 
and  proper  design  parameters  were  evaluated.  Before 
implementing  a  working  system,  however,  a  few  impor¬ 
tant  remarks  should  be  observed: 

In  the  estimation  of  constraints  and  errors  we  based 
most  of  the  calculations  on  worst  case  conditions. 
Therefore,  we  may  expect  actual  performance  levels  to 
be  much  better  than  stated  here  because  worst  case 
parameter  values  are  seldom  encountered.  Further¬ 
more,  the  appearance  of  all  the  worst  case  parameters 
together  will  be  extremely  rare. 

We  have  described  two  architectures,  and  at  this 
point  it  is  difficult  to  say  which  is  preferred  since 
preference  appears  to  be  application  dependent. 
While  in  the  modified  architecture  one  may  definitely 
use  a  larger  number  of  weight  levels,  in  the  first  archi¬ 
tecture  more  channels  may  be  incorporated  due  to  the 
possibility  of  overlapping  holograms. 

In  many  applications  one  does  not  need  all  the  possi¬ 
ble  interconnections  as  in  the  human  brain  where  only 
a  very  small  fraction  of  the  total  number  of  neurons  are 
actually  interconnected.  For  cases  like  this  one  may 
arrange  the  channels  in  the  array  so  that  most  of  the 
interconnections  are  made  among  nearby  channels 
from  the  constraints  point  of  view.  Such  a  sectioning 
of  the  array  will  allow  a  significant  increase  in  the 
overall  array  size  since  our  constraints  must  be  kept 
only  within  a  single  section. 
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Finally,  we  should  point  out  that  in  this  application 
the  holographic  reconstruction  produces  the  complex 
conjugate  of  the  writing  wavefront.  Therefore,  while 
propagating  through  the  optical  system  more  phase 
distortions  are  compensated  if  they  are  the  same  as 
during  hologram  recording.  This  means  that  the  opti  - 
cal  components  do  not  have  to  be  of  the  highest  quali¬ 
ty,  although  they  must  perform  adequately  well  for  the 
hologram  recording. 

Appendix  A:  Paraxial  System  Analysis 

The  purpose  of  this  Appendix  is  evaluation  of  the 
performance  of  an  ideal  system  to  realize  the  limita¬ 
tions  introduced  by  fundamental  physical  processes. 
We  calculate  the  system  transfer  characteristics  using 
paraxial  approximation  and  the  operator  notation 
which  is  summarized  in  Appendix  B.  We  also  assume 
ideal  SLM  performance  and  hologram  reconstruction. 
To  simplify  the  expressions  we  ignore  constant  phase 
and  amplitude  factors  that  will  affect  the  signal  and 
noise  in  the  same  way;  i.e.,  they  are  not  important  for 
this  discussion.  In  any  case,  if  these  factors  are  need¬ 
ed,  they  can  be  reinstalled  by  using  simple  physical 
considerations. 

We  start  these  Appendices  from  the  hologram  re¬ 
cording  stage,  which  is  the  same  for  the  original  archi¬ 
tecture  as  well  as  for  the  modified  architectures  and 
then  discuss  separately  the  three  configurations. 

A.  Hologram  Recording 

For  a  discussion  of  the  hologram  recording  stage  we 
return  to  Fig.  1  and  the  notations  used  in  the  main  text. 
We  introduce  a  light  source  with  complex  amplitude 
u(x,y)  at  the  location  of  the  ijth  detector.  Without 
losing  generality  we  may  simplify  the  notation  by  as¬ 
suming  /i  »  /2  *  /.  The  complex  amplitude  U,  inci¬ 
dent  on  the  SLM  can  be  expressed  in  operator  form  by 
the  relation 

^7,  -  <3[-l//]:7?(/]<f[idrfi  +  jd£\u(x,y),  (Al) 

where  we  introduced  the  shift  operator  #  [Eq.  (B5)]  to 
represent  the  position  of  the  source  and  denoted  by  i 
and  $>  the  unit  vectors  along  x  and  y,  respectively.  The 
input  complex  amplitude  is  operated  on  by  #[/],  the 
free  space  propagation  operator  (FPO)  [Eqs.  (B7)- 
(B9)]  through  a  distance  {  and  is  finally  multiplied  by 
the  quadratic  phase  factor  Q[-l/f\  [Eq.  (Bl)],  which  is 
the  transfer  operator  of  an  ideal  thin  lens  [Eq.  (B6)[. 
This  distribution  is  multiplied  by  the  SLM  transfer 
function  specified  for  the  ijth  recording, 

h„(x,y)  -  £  k:]u*{kdj  +  Idj  1  recti .r/p.o'/p,),  (A2) 

M 

where  we  again  employed  the  shift  operator  to  place  a 
rectangular  window  function  at  the  proper  position  of 
each  pixel.  This  SLM  transfer  function  is  again  trans¬ 
formed  by  the  second  lens  propagated  a  distance  /  and 
finally  recorded  on  the  hologram  at  position  ij  as 

u>,  •  (A3) 

Returning  to  Eq.  (Al)  we  substitute  Eq.  (B18)  for 
the  FPO  and  use  Eq.  (B8)  to  obtain 


LP„  -  Hl/X/]70[l//]J«]u(x,y),  < A4) 

where  we  defined 

d*  ”  iddx  +  jd^y.  iA5> 

Using  Eq.  (B13)  to  commute  the  shift  operator  with 
the  quadratic  phase  we  obtain 

l",  -  .[lAf]7#K]S[df;//]Q(l//lu(x.».  1 A6) 

The  operation  of  the  Fourier  transform  (FT)  operator 
is  evaluated  by  using  Eqs.  (B15)  and  (B16)  to  yield 

07,  -  K[lA/]S?[-Xdf,]#(df;A/l7<3[l//Iu(x.».  i  AT) 

Operating  with  the  scaling  operator  we  obtain 

07,  -  S[-df,//I«P[df,],[lA/]7C(l//l«lx,».  'As i 

If  the  source  is  an  ideal  point  source,  i.e.,  u(x,y)  ~ 
5 (x,y),  the  quadratic  phase  factor  is  eliminated  by  the 
sifting  theorem,  and,  recalling  that  the  FT  of  a  delta 
function  is  unity,  we  obtain  a  displaced  plane  wave 
traveling  at  an  angle  defined  by  the  position  of  the 
point  source: 

07,  -  S[-dJ//l#[dJ].  1  A9> 

Naturally,  the  displacement  of  a  plane  wave  has  no 
meaning.  Thus  we  are  left  with  a  linear  phase  factor  as 
it  should  be.  The  finite  extent  of  the  source  intro¬ 
duces  an  apodizing  factor  to  the  illumination  of  the 
SLM  plane.  Thus  the  illumination  of  the  SLM  will 
not  be  uniform,  and  this  nonuniformity  will  be  shifted 
according  to  source  location.  Usually  the  source  will 
be  approximately  a  Gaussian  source,  and  the  complex 
amplitude  distribution  over  the  SLM  plane  can  be 
predetermined  and  partially  compensated  for.  Keep¬ 
ing  this  in  mind  we  proceed  with  the  assumption  of  an 
ideal  point  source. 

Substitution  into  Eq.  (A3)  together  with  Eq.  (B18) 
leads  to 

o„  »  Q(l//MlAf);7Mx,y)S[-d?,//].  ■Abu 

Performing  the  FT  operation  we  obtain 

U„  -  <5(1//1*(1A/I<f (-df/X/IH,,  -  Q(l/f|f  [-d'Ml/X/lff, . 

(All  i 

where  H, ,  is  the  FT  of  hi,.  If  we  need  to  reinstall  the 
source  distribution  we  should  substitute  here  and  in 
the  following: 

^(lA/]H„-(»(i//]Hv).(St-d//]W-i]Q[l//)u(i,>i|.  1  X  i i 

where  we  used  some  operator  algebra  and  the  -  denou 
convolution.  The  complex  amplitude  of  Eq.  (All)  is 
recorded  as  the  t;th  hologram  throut  in  aperture  or 
more  generally  through  some  window  function  wix.y  i 
that  apart  from  a  limiting  aperture  may  also  include 
some  apodization  function.  In  most  cases,  however, 
this  window  function  will  be  of  the  form 

wU.*)  -  rect i.Ai:ii 
\Ph  PhJ 

It  should  be  noted  here  that  a  reinstallation  of  the 
finite  source  size  would  contribute  a  convolution  over 
the  hologram  plane  tending  to  spread  the  flux  and 
reduce  the  information  content  of  the  system. 
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B.  Reconstruction  and  Operation — Original  Architecture 
The  hologram  array  is  reconstructed  with  the  conju¬ 
gate  of  the  reference  beam.  Thus,  for  each  hologram, 
we  reconstruct  the  complex  conjugate  of  an  expression 
like  Eq.  (All)  multiplied  by  the  window  function 
w{x,y).  The  reconstructed  beam  is  propagated 
through  a  distance  f,  multiplied  by  the  lens  transfer 
function  and  then  by  a  new  function  g{x,y),  written  on 
the  SLM.  This  new  function  is  the  input  vector  to  the 
processing  system. 

g{x,y)  ”  ^gmn^[md,x  +  rui^l  rect(i/p„>7p,).  <A14) 

'nn 

After  multiplication  by  g( x,y)  the  complex  ampli¬ 
tude  is  transformed  by  the  second  lens  and  propagated 
a  distance  /  to  the  detector  plane  that  will  receive  the 
complex  amplitude  from  hologram  ij, 

Uf,  •  n\f\<3[-Vf]g(x<y)Q[-Vf]m 

x  <AI5> 

where  the  complex  conjugate  of  Eq.  (All)  has  been 
substituted  and  modulated  by  the  properly  positioned 
window  function.  The  overall  field  distribution  on  the 
detector  plane  will  be  the  coherent  summation  from  all 
the  holograms, 

V<  -  2  Li,-  <A16) 

Using  Eq.  (B18)  and  then  Eq.  (B8)  in  Eq.  ( A15)  leads  to 

(A17) 

where  we  took  into  account  that  the  hologram  plane  is 
the  inverted  image  of  the  detector  plane  with  unit 
magnification,  that  is, 

*«I  -  *I-<|  (A18) 

and  employed  the  definition  of  the  shift  operator  [Eq. 
(B6)].  Commuting  the  shift  operator  with  the  FT 
operator  using  Eqs.  (B14)  and  (B16),  we  obtain 

Li,  -  Q[l//)i-[l/\f]i7j(*,y)i-llAf]»[\dJl3'u-(i,>)^l/X/lH:,. 

(A19> 

Moving  now  the  linear  phase  factor  more  to  the  left 
we  may  write 

Li,  -  <5[l//)#(d?>[l/X/]7«(i.>)Wl/X/]7^A,>)WlA/1ff:;, 

(A20) 

Now  we  may  translate  the  right-hand  side  scaling  op¬ 
erator  to  the  left  and  combine  it  with  the  middle  one  to 
obtain 

LI,  -  Q[\lf\g[&$v{\l\f]7gix,y)\}w(\fx.\fy)H‘ir  (A21) 

The  overall  process  generated  an  extra  quadratic 
phase  factor  that  is  not  important  for  the  detection 
while  the  shift  operator  places  the  center  of  the  distri¬ 
bution  at  the  proper  pixel  on  the  detector.  The  signal 
to  be  detected  and  the  amount  of  crosstalk  to  be  ex¬ 
pected  are  determined  by  the  rest  of  the  expression. 
To  analyze  the  various  contributions  of  the  functions 


involved  let  us  start  with  an  unlimiteu  hologram  re¬ 
cording,  that  is,  w(x,y)  =  1.  To  shorten  the  equations 
we  define  a  new  amplitude  U'  by  the  relation 

Ll-Q[\lf)S[&W.  1 A22) 

Putting  w(x,y)  =  1  and  performing  the  right-hand  side 
FT  operation  in  Eq.  (A21)  we  obtain 

L"  ■  k(1/X/1  iA23i 

which  essentially  is  the  cross-correlation  of  the  FT  of 
g(x,y)  and  h(x,y).  With  our  special  functions  [Eqs. 
(A2)  and  (A14)],  however,  it  is  easier  to  substitute 
them  right  away  and  observe  that  they  are  real  and  the 
rect  function  is  symmetric.  Both  functions  are  com¬ 
posed  of  the  same  rect  function  with  terms  shifted  to 
various  positions.  Since  the  centers  are  spaced  at 
distances  that  are  multiples  of  d,  that  are  never  smaller 
than  the  extent  of  the  function  itself  p„  the  only  non¬ 
zero  terms  in  the  product  of  the  two  sums  are  those 
having  kl  =  mn.  Furthermore,  the  square  of  the  rect 
function  is  also  the  function  itself;  thus  we  may  write 

L"  =*  *(1A/]?  rect(x/p„>/p,).  <A24) 

kl 

where  we  introduced  a  position  vector  similar  to  Eq. 
(A5).  Performing  the  Fourier  transformations  and 
the  scaling  operations  lead  to  the  final  expression 

L"  *  ,*,§(- di///]  sine \xp,l\f.yp,i\f).  <A25i 

Kt 

In  a  real  situation  we  should  reinstall  the  source  distri¬ 
bution  u(x,>  ),  which  should  be  convolved  with  the  sine 
function  to  obtain  the  final  distribution. 

Some  of  the  consequences  of  relation  (A25)  are  dis¬ 
cussed  further  in  the  main  text  while  here  we  continue 
by  returning  to  the  complete  expression  that  contains 
the  window  function  but  assuming  a  source  distribu¬ 
tion  much  smaller  than  the  sine  distribution.  If  we  do 
this,  instead  of  Eq.  (A23)  we  have 

L"  »  »[1A f]7g[x,y)Jw{\fx.\fy)H:r  <A26) 

If  we  consider  only  the  right-hand  side  FT  we  are 
essentially  on  the  SLM  plane,  and  we  see  that  there  the 
distribution  is  given  by  a  convolution  of  the  FT  of  the 
window  function  with  the  interconnection  function  h. 
Thus  this  distribution  is  widened  contributing  to  the 
crosstalk  over  the  SLM  plane.  However,  in  the  final 
distribution,  that  may  be  written  in  the  form 

L"  ”  »(l/X/]|7j(x.>)| .  \w(x,y)i/{l/\f\H"l,l,  <A27> 

where  *  denotes  convolution.  The  window  function 
actually  reduces  the  crosstalk  over  the  detector  plane 
since  the  width  of  HtJ  is  cut  by  the  window  function 
before  performing  the  convolution  operation  with  the 
FT  of  the  input  vector  g. 

C.  Reconstruction  and  Operation-Modified  Architectures 

In  the  modified  architectures  (Fig.  5)  the  hologram 
recording  process  is  similar  to  that  of  the  original  con¬ 
figuration.  Thus  we  may  start  here  also  from  the 
reconstruction  where  now  each  hologram  is  multiplied 
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directly  by  its  input  vector  element.  The  complex 
amplitude  is  now  propagated  a  distance  /,  multiplied 
by  the  lens  transfer  function,  and  detected  on  the 
detector  plane.  The  lens  has  no  effect  on  our  results 
here,  but  it  is  useful  in  correcting  for  a  quadratic  phase 
factor  to  reduce  the  variation  in  the  angle  of  incidence 
over  the  detector  plane.  Denoting  the  complex  ampli¬ 
tude  of  the  light  transmitted  by  the  i/th  element  of  the 
input  vector  by  gtl,  it  reconstructs  the  i/th  hologram  to 
generate  the  distribution  over  the  detector  plane  given 
by 

ui  -  en-i/nms,, 

X  |#[-dfJ]u;(i,>)|Q[-l//]#(-df>[l/X/lH:;.  (A28) 

where  we  introduced  again  the  shifted  window  func¬ 
tion  that  delineates  the  hologram  (and  now  also  the 
SLM  pixel)  size.  Substituting  Eq.  (B18)  for  the  FPO, 
using  Eq.  (B8),  and  taking  into  account  that  gi,  is  just  a 
constant  and  j^[— df;]w(x,y)|  is  a  scalar  function,  we 
obtain 

Uft  -  <A29) 

Performing  the  scaling  and  FT  operations  leads  to 

IT},  -  ^S(-df;//ll[.[l/X/]7u,U.>)|  .  *;,(-x.-y>|.  (A30) 

Taking  into  account  that  h  is  a  real  function,  composed 
of  symmetric  terms  (the  rect  functions),  we  recon¬ 
structed  the  original  h  function  multiplied  by  the  vec¬ 
tor  element  gi;,  but  each  pixel  distribution  is  now  wid¬ 
ened  due  to  the  convolution  with  the  FT  of  the  window 
function.  This  convolution  will  cause  a  crosstalk  by 
extending  the  distribution  from  each  detector  pixel 
into  adjacent  pixels.  T o  obtain  the  complete  distribu¬ 
tion  over  the  detector  plane  one  must  sum  all  the 
contributions  coherently, 

l*  -  Y  L%  (A31) 

If  we  substitute  Eq.  (A2)  we  obtain  a  relation  similar 
to  Eq.  (A25),  but  this  time  the  linear  phase  factors 
originate  from  the  hologram  position.  The  advantage 
of  the  laser  diode  array  architecture  i9  that  these  phase 
factors  are  canceled  due  to  the  incoherent  superposi¬ 
tion, 

(A32) 

Appendix  B:  Summary  of  Operator  Algebra 

In  this  Appendix  we  summarize  the  definitions  and 
relevant  relations  of  the  operator  algebra.  For  sim¬ 
plicity  we  shall  ignore  all  constant  factors  (phase  and 
amplitude)  that  are  irrelevant  for  the  discussions  in 
this  paper  since  we  are  interested  in  the  complex  am¬ 
plitude  distributions  and  not  their  absolute  magni¬ 
tudes  that  can  be  estimated  from  simple  consider¬ 
ations. 

Assuming  for  all  operations  a  general  complex  func¬ 
tion  f{x,y),  we  define  the  basic  operators  as  follows. 
Quadratic  phase  factor: 


<5(a|  »  exp^y  ap'V  'Bit 

with  k  =  2 tt/X,  and  p  denoting  the  transversal  coordi¬ 
nate 

P  »  xx  +  yy;  p  *  pi.  1  B2l 

Linear  phase  factor: 

S[sj  a  expOfcs  •  p).  i  B3 ) 

A  scaling  operator  p[a]  is  defined  by  the  relationship 

i'(a|/(x,.y)  »  /(ax.ay)»(a),  'B4l 

and  the  Fourier  transform  operator  is  defined  by  the 
integral 

7fix,y)  =■  /  f(x'.y’)  exp(2ir;(xx’  +  yy'ldxdy'].  >B5i 

The  shift  operator  is  defined  by  the  equation 

cS’jml/U.y)  »  fix  -  m,,y  -  'B6I 

The  transfer  operator  of  an  ideal  thin  lens  of  focal 

length  f  is 

L\!\  -  <2(-l//].  iBTi 

Some  basic  relations  are  evident  from  the  defini¬ 
tions  of  the  basic  operators: 

Q[a]G(fe]  -  <2[a  +  6],  (B8) 

p(a|i>(6]  *  n[o6],  'B9I 

i/[a]<3[i>]  «  'BIO) 

i'[d)4>lml  »  i  Bl  1 ) 

i-[6]S[m]  »  g[m6]i'(61,  i B12) 

Q(a],y[m)  =»  •S,[mlS|amlQ[a].  iB13) 

As  stated  above,  constant  factors  have  been  ignored  in 
some  of  these  equations  and  also  in  the  following. 
Using  Fourier  analysis  we  can  show  that 

I’lfel?  »  (B14) 

Jg[»|  -  ^[s/X|7,  iBlo) 

M“(m)  -  ^l-Xm)?.  1BI6) 

Free  space  propagation,  i.e.,  the  Fresnel-Kirchhoff 
integral,  is  described  by  the  FPO,  which  can  be  ex¬ 
pressed  in  various  ways  by  the  basic  operators: 

flfdl  -  7-lQ[-\-d]7  -  7Q(-X2d]7_l,  (BIT) 

where  d  is  the  propagation  distance.  Another  useful 
expression  is 

fi[d}  -  Q(l/dMl/Xd)7Q(l/d|,  (B18) 

and  for  large  distances  an  asymptotic  expression  may 

be  also  employed: 

limpid)  -  limijl/Xd];?.  (B19) 

The  FPO  satisfies  the  cascading  property 

d[a)?}(61  "  Ji[a  +  6],  iB20) 
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A  simple  optical  system  containing  a  single  lens  may 
satisfy  the  Fourier  transforming  condition: 


mO[-Vf\m  -  <3  [y  (l  -  y)]  * 7  (B21) 

that  becomes  exact  for  d  *  /.  Alternatively,  the  imag¬ 
ing  condition, 

l/a  + 1/6-1//,  (B22) 


yields 


R[a]Q[-\lf\n[b\  -  <5 


f— a/6]. 


IB23) 


With  the  basic  relations  we  can  also  show  that 

R[d\Q{\/q\  -  <5(l/(d  +  <j))i/(l/(l  +  dlq)\‘ft\(\ld  +  l/?)'1),  (B24) 


»[b\Ji{d]  -  fl[d/b2]v[b). 


(B25) 
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APPENDIX  C 


APPLICATIONS  PAPERS 

One  of  the  early  suggested  applications  of  massive 
parallelism  was  to  cellular  array  processing  (Opt.  Eng.  25,  825). 
This  applies  as  well  to  symbolic  substitution  as  practiced  by  Bell 
Laboratories  (Alan  Huang)  and  to  simple  associative  predictors 
(Opt.  Eng.  25,  1179). 

Perhaps  the  most  important  SDI  application  is  to  massive 
parallel  optical  data  base  management  (SPIE  938,  52  and  Appl. 
Opt.  29,  Vol.  2,  195).  This  is  the  fastest  way  to  search  gigabit 
files. 


Of  course,  these  holographic  memories  can  store  templates 
for  pattern  recognition  (SPIE  754,  74)  or  switching  patterns  for 
binary  optical  devices  (SPIE  769.  101)  or  generalized  mapping 
operations  (SPIE  881.  56).  The  concept  of  stacked  holograms 
for  this  purpose  also  has  some  promise  (SPIE  883,  203). 


■  Systolic  optical  cellular  array  processors 


H.  John  Caulfield,  fellow  spie 
The  Center  for  Applied  Optics 
The  University  of  Alabama  in  Huntsville 
Huntsville,  Alabama  35899 


Abstract.  Using  space-variant  pattern  recognition  of  up  to  256  3  x  3  pat¬ 
terns  of  1's  and  0's  In  parallel  and  Inserting  Image  information 
sequentially  in  a  well-defined  pattern,  we  can  construct  an  optical 
systolic  cellular  array  processor  for  3 x3  neighborhoods  that  produces 
output  points  at  one-third  the  rate  at  which  points  are  Input.  This  allows 
reprogrammable  preprocessing  of  data  input. 

Sublect  terms:  Space  Station  optica;  optical  processing;  cellular  array  pro¬ 
cessors;  optical  computing;  systolic  arrays. 
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1.  INTRODUCTION 

One  of  the  announced  goals  of  the  Space  Station  program  is 
to  include  robots  for  operation  and  repair  in  the  space  en¬ 
vironment.  While  not  yet  R2D2  clones,  these  robots  must 
perform  tasks  well  beyond  the  current  state  of  the  art. 
Typical  of  such  tasks  is  the  tracking  (six  kinematic 
parameters  per  object)  of  multiple  (say,  0  to  M)  objects 
from  a  known  set  of  N  >  M  possible  objects.  The 
background  clutter  is  unpredictable.  Occultations  are  prob¬ 
able.  Lighting  will  be  nonuniform.  Response  times  must  be 
very  fast,  say,  TV  frame  time. 

Many  “tricks”  must  be  applied  to  make  this  happen.  The 
Ames  Research  Center  is  working  on  intelligent  optical  pat¬ 
tern  recognition  and  optical  control  processing.*  The  Jet 
Propulsion  Laboratory  is  developing  rapid  coherent  optical 
data  base  search  methods,  t  The  Johnson  Space  Center  is 
developing  coherent  optical  pattern  recognizers  with  in¬ 
variances  to  various  of  the  six  kinematic  parameters. t 
Besides  these  internal  NASA  programs  (this  list  is  almost 
certainly  incomplete),  programs  must  be  developed  outside 
NASA  as  well  if  optics  is  to  play  a  powerful  role  in  these 
robots.  A  marriage  of  work  in  optical  processing  both  inside 
and  outside  NASA  will  be  required  for  these  robots. 


•D.  Cliffone,  private  communication  (1985). 

tH.  K.  Liu.  private  communication  (1985). 

tR.  Juday  and  Michael  Duff,  private  communications  (1985). 
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One  task  not  being  attacked  is  very  rapid  (relative  to  the 
frame  time)  nonlinear  image  preprocessing.  We  have  in 
mind  tasks  such  as  skeletonization,  median  filtering,  feature 
location,  and  noise  removal.  Such  tasks  are  well  suited  to 
modern  cellular  array  processors,  and  the  speeds  of  some  of 
these  are  essentially  fine  for  the  task.  On  the  other  hand,  if 
we  wish  to  do  many  such  operations  in  a  frame  time  (a 
strong  likelihood  in  view  of  the  iterative  nature  of  many  of 
the  algorithms),  new  technologies  may  be  needed.  Also,  we 
would  like  flexibility  to  program  the  cellular  array  processor 
to  perform  noniterative  sequences  of  operations.  These 
tasks  may  be  facilitated  by  an  optical  cellular  array  pro¬ 
cessor. 

2.  CELLULAR  ARRAY  PROCESSORS 

Cellular  array  processors  are  simply  regular  arrays  of  locally 
interconnected  synchronous  processors,  or  cells.  There  is  a 
well-defined  cycle  time  in  which  each  cell  receives  informa¬ 
tion  from  all  its  neighbors,  performs  its  characteristic 
calculation,  and  has  its  value  replaced  by  a  new  value.  Nor¬ 
mally,  there  is  a  one-to-one  mapping  of  cells  onto  pixels.1 

We  consider  here  only  finite  impulse  response  (FIR) 
operations,  or  neighborhood  operations.2  For  this 
preliminary  study  we  specialize  to  a  very  small  but  ser¬ 
viceable  3x3  neighborhood  in  a  square  array.  Furthermore, 
we  specialize  to  a  binary  image.  Removing  both  specializa¬ 
tions  is  possible  but  difficult  enough  to  be  a  distraction  in 
this  initial  study.  More  general  and  complex  optical  cellular 
array  processors  have  been  proposed  by  Tanida  and 
Ichioka.3  *  By  specializing,  we  can  simplify  the  design  con¬ 
siderably. 

The  approach  we  use  is  cell  replacement.  Rather  than  ex¬ 
plain  this  method  abstractly,  we  offer  some  trivial  examples 
that  should  make  the  generalization  obvious. 

Suppose  we  want  to  recognize  the  corners  of  all  objects  in 
the  scene.  We  can  do  this  by  replacing  the  central  pixel  in  a 
3x3  neighborhood  by  a  1  if  the  cell  has  any  of  the  four  pat¬ 
terns  shown  in  Fig.  1 .  All  other  patterns  will  produce  a  0  in 
the  center  pixel. 
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Fig.  1.3*3  neighborhoods  that  load  to  ■  1 1n  tha  output  for  comer 
recognition. 


Now  suppose  we  wish  to  use  a  median  filter  to  smooth  out 
noise  without  blurring  "real  edges.”  Then  every  3x3  pat¬ 
tern  with  five  or  more  l’s  will  lead  to  the  center  cell  being 
replaced  by  a  1 .  All  other  patterns  lead  to  a  0. 

Readers  unfamiliar  with  these  concepts  might  wish  to  "in¬ 
vent”  some  “algorithms.”  For  instance,  a  shift-right-one¬ 
cell  substitution  is  easy  to  discover. 


3.  GENERAL  APPROACH 

There  are  "only”  2*  =■  512  possible  patterns.  All  we  need  to 
do  is  recognize  at  most  2 *  »  256  patterns— the  worst  case 
being  when  1-  and  O-yielding  patterns  are  equal  in  number. 

If  we  use  space-variant  pattern  recognition,  i.e.,  if  we 
control  where  the  pattern  appears,  the  pattern  recognition 
becomes  trivial. 

On  the  other  hand,  only  one  pattern  at  a  time  can  be  in 
any  particular  location.  This  suggests  that  only  one  output 
pixel  at  a  time  can  be  generated.  As  we  move  the 
neighborhood  from  one  pixel  to  the  next  in  raster  format, 
six  of  the  nine  pixels  stay  in  the  neighborhood,  three  drop 
out,  and  three  are  added.  This  suggests  a  pulsating  flow  pat¬ 
tern,  in  fact,  a  systolic  array  processor. 

Accordingly,  we  have  designed  an  optical  systolic  array 
processor  for  cellular  array  processing.  Because  the  pro¬ 
cessor  is  pipelined,  it  generates  outputs  at  a  rate  propor¬ 
tional  to  the  input  rate.  A  simple  systolic  cellular  array  pro¬ 
cessor  of  the  type  to  be  described  can  move  data  out  at  one- 
third  of  the  data  input  rate.  If  the  recognition  occurs  in 
parallel  (not  unreasonable  for  256  "channels”),  the  input  is 
the  effective  rate  limiter. 

The  rest  of  this  paper  shows  one  way  of  doing  this.  Many 
other  ways  (some  without  pulsing,  some  doing  multiple  lines 
at  once,  etc.)  will  occur  to  optics-oriented  readers.  The 
method  we  show  was  chosen  for  didactic  and  constructive 
simplicity. 

Let  us  label  the  cells  as  follows: 

(1.1)  (1.2)  .  . .  (1.N) 

(2.1)  (2.2)  .  .  .  (2.N) 


(M.l)  (M,2)  .  .  .  (M.N)  . 

Any  size  or  shape  of  neighborhood  can  be  defined. 
Somehow  padding  (pixels  around  the  edges  of  images)  must 
be  defined  to  allow  edge  cells  to  be  calculated.  A  sample 
neighborhood  is  the  3  x3  cell.  For  instance,  the  "image"  of 
cell  (2,2)  (call  it  (2,2)')  depends  on  the  pattern  of  l’s  and  0’s 
in  the  subarray. 


(1.1)  (1,2)  (1.3) 

(2.1)  (2,2)  (2,3) 

(3.1)  (3.2)  (3,3). 


We  now  show  our  proposed  method  for  doing  the  above 
neighborhood  operations  optically.  We  first  map  the  2-D 
neighborhood  into  a  1-0  neighborhood  in  an  appropriate 
way.  For  (2,2)',  the  neighborhood  is 


(1.3) 

,  (2.3) 

(3.3) 

(1,2) 

(2.2) 

(3.2) 
(1.1) 
(2.1) 

(3.1) . 

For  (2,3)',  the  neighborhood  is 

(1.4) 

(2.4) 

(3.4) 

(1.3) 

(2.3) 

(3.3) 

(1.2) 

(2.2) 

(3,2). 


Note  that  moving  from  (2,2)'  to  (2,3)'  involves  moving  the 
existing  pixels  down  three  positions  and  adding  three  new 
ones  at  the  top.  If  a  shift  occurs  in  At,  the  pattern  is 
LLLLLLLLRLLRLLRLLRLLR...,  where  L  is  the  load 
operation  and  R  is  the  read  operation.  Flowing  pulsating 
calculations  in  essentially  identical  local  processing  units 
have  come  to  be  called  "systolic.” 

In  the  simplest  case  the  data  will  be  flowed  in  using  an 
acousto-optic  cell.  For  this  purpose,  we  can  envision  an 
acousto-optic  cell  as  a  shift  register  of  optical  transmissions. 
That  is,  a  set  of  optical  transmission  values  of  l’s  and  0*s 
can  be  inserted  into  the  top  of  such  a  cell.  They  thereafter 
flow  through  the  cell  at  a  uniform  speed. 

After  the  initial  pipeline  loading,  an  appropriate  nine  pix¬ 
els  are  present  to  calculate  a  new  image  cell. 

Let  us  use  two  side-by-side  acousto-optic  cells  (in  the  same 
material  for  convenience)  to  represent  a  neighborhood. 
Thus,  the  2-D  and  1-D  mapping  converts 
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0 

0 

0 

1 

1 

0 

1 

1 

0. 

Our  new  mapping  converts  this  to 

0  1 
0  1 
0  1 
I  0 
1  0 
0  1 
1  0 
1  0 
0  1. 

An  optical  signal  with  light  of  a  strength  0  or  1  in  this  pat¬ 
tern  is  easy  to  produce  with  a  two-cell  acousto-optic  device. 
Both  l's  and  0’s  in  the  original  cell  are  now  represented  as 
light-on  positions. 

A  recognition  spatial  mask  with  transmissions  1  and  0  can 
be  inserted  into  the  optical  pattern.  Indeed,  up  to  256  binary 
matched  filters  can  be  addressed  in  parallel  using  a  typical 
spatial  light  modulator  (SLM)  as  a  mask. 

When  an  exact  match  between  the  input  signal  and  the 
mask  occurs,  the  optical  signal  integrated  over  all  nine  (18) 
cells  is  9.  No  other  mask  can  give  a  signal  higher  than  8. 
Thus,  even  analog  optics  can  yield  an  extremely  low 
misidentification  rate. 

Any  signal  (using  multiple  masks)  greater  than,  say,  8.S 
will  give  an  output  1.  All  other  signals  give  a  0.  Figure  2 
shows  the  sequence  of  operations  just  discussed. 

Having  given  an  overview,  we  now  backtrack  and  cover 
some  mechanisms  briefly. 

Acousto-optic  cells  are  crystals  with  attached  transducers 
for  launching  bulk  rf  acoustic  waves  into  them.  The  com¬ 
paction  and  rariflcation  of  the  crystal  by  the  sound  waves 
produces  an  instantaneous  diffraction  grating  that  prop¬ 
agates  through  the  crystal  at  the  speed  of  sound.  Any 
modulation  of  that  rf  carrier  modulates  the  diffracted  light. 
Schlieren  with  optics  is  used  to  image  only  the  diffracted 
portion  of  the  incident  light.  The  effect  produced  is  that  of  a 
moving  amplitude  pattern— exactly  what  is  needed  accord¬ 
ing  to  the  analysis  just  presented. 

Fixed  recognition  masks  can  be  made  photographically, 
but  SLMs  allow  us  to  build  a  real-time  reprogrammable  op¬ 
tical  cellular  array  processor. 

4.  ASSESSMENT 

This  simplest  optical  cellular  array  processor  inputs  data  to 
the  acousto-optic  bandwidth  B  and  outputs  data  at  a  rate 


rtECG^mn 
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0  1 
1  0 
0  1 
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0  1 
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MAX  SIGNAL  ■  9 
T  -  8.5  7 

Fig.  2.  Sequence  of  transformation  from  ■  3  x  3  binary  array  to  a 
unlqua  2x8  array  of  8  “on”  and  8  “off”  cells  that,  when  passed 
through  a  matching  mask,  aummad,  and  thresholded,  uniquely 
Identify  the  particular  call  configuration. 


B/3.  For  a  frame  of  500  x  500  pixels  and  a  rational  band¬ 
width  of  90  MHz  this  leads  to  120  frames  per  second.  More 
complex  processors  will  go  even  faster. 
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1.  INTRODUCTION 

While  most  attempts  at  optical  computing  have  aimed  at 
numerical  processing1'5  or  numerically  assisted  reasoning.4-1 
little  effort  seems  to  have  been  devoted  to  symbolic  process¬ 
ing.  In  particular,  prior  work  on  optical  symbolic  processors 
has  aimed  at  drawing  inferences  from  input  data,  stored  data, 
and  rules.  This  paper  is  intended  to  show  that  optical  comput¬ 
ers  can  do  far  more  than  that.  In  principle,  they  can  be  taught 
to  speak  English,  tell  stories,  play  simple  games,  etc.  The 
particular  approach  shown  here  is  chosen  because,  of  all  the 
non-inference-drawing  schemes  1  have  discovered,  it  appears 
to  be  the  simplest  to  implement  optically.  No  claim  to  opti¬ 
mality  in  any  sense  is  made;  rather,  I  hope  to  draw  some 
attention  to  this  heretofore  largely  neglected  task  for  optical 
processors. 

By  symbols  we  mean  what  is  meant  in  normal  conversa¬ 
tion;  letters,  words,  numbers,  events,  concepts,  etc.  We 
assume  that  these  symbols  can  be  listed,  i.e.,  that  they  are 
countable.  This  means  that  we  could  make  a  list  of  them  and 
assign  a  number  to  each.  Our  object  will  be  to  produce  mean¬ 
ingful  and  useful  strings  of  symbols:  sentences,  equations, 
reactions,  etc.  One  of  the  good  features  sought  is  innovation, 
although  we  may  wish  to  control  its  rate  of  production. 

2.  JOINT  CONTEXT  NETWORK  (JCN) 

The  essence  of  the  JCN  is  to  remember,  predict,  or  postulate 
the  next  symbol,  given  the  previous  N  (as  well  as  prior  teach¬ 
ing)  as  the  context.  We  will  call  N  the  context  depth. 

An  example  may  prove  helpful.  1  have  just  read  an  article 
on  speech  recognition.  A  typical  sentence  begins:  “Thediffer- 
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ences  between  normal  and ...  .“  With  ( I )  your  knowledge  of 
the  structure  of  the  English  language  and  (2)  the  context  of 
that  beginning,  you  have  little  difficulty  in  “predicting”  that 
the  next  word  will  be  “abnormal."  Can  we  make  an  optical 
symbolic  computer  that  can  do  the  same?  Confronted  with  a 
new  situation,  can  we  generate  an  appropriate  response  based 
on  past  learning  plus  trained  “insight"? 

A  wonderfully  readable  book  by  J.  H.  Andrae,  titled 
Thinking  with  the  Teachable  Machine .*  gives  the  details  of 
the  context-driven  approach  we  call  JCN.  1  show  here  a 
simple  example.  Say  my  task  is  weather  forecasting.  1  want 
to  know  if  today  will  be  dear  (C).  rainy  (R).  or  partly  cloudy 
(P).  The  past  few  weeks  have  been  CCCPPRRPCC 
CCCCRPCCPCCCCCCP.  What  should  I  predict  next?  For 
N  =  2  JCN,  we  learn 

CC  ->  C  (the  first  two  symbols  implied  the  third)  . 

CC->P, 

CP-»  P  . 

PP-j*  R  , 

PR  R  . 

RR  ->  P  , 

RP  C  . 

PC  C  . 

CC  ->  C  (earlier  we  had  CC  -*•  P)  . 

CC  ->  C  (a  popular  implication)  . 

CC  C  . 

CC  -■>  C  . 

CC  R  . 

CR  P  , 

RP  C  , 

PC  C  . 

CC  P  . 

CP  C  . 

PC  C  . 


Our  N  —  2  context  is  CP  Bay  J  >n  prior  observations,  the  two 
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Fig.  1  For  context  N  =2.  we  nood  two  bistability  arrays  (A  and  8).  In 
(a),  wa  aaa  concept  112  being  Mitered  as  a  column  of  light  for  one 
panel  (A).  In  (bl.conceptCoae'a  entered  on  both.  Because  the  C112  is 
stM  on.  the  threshold  is  exceeded  at  the  intersection.  This  occurs  only 
when  C112  is  followed  by  Coas-  In  Id,  we  have  added  concept  C«oo- 
Because  the  Cgaa  light  is  still  on,  the  interaction  in  B  exceeds  thresh¬ 
old  and  in  dicat  ee  that  concept  Coas  followed  by  C400  has  just 
occurred.  Holograms  iMuminsted  by  light  passing  through  those 
intersection  points  spread  light  across  detectors  or  other  bistable 
panels  if  greater  than  depth  2  is  required. 


possible  predictions  for  the  next  observation  are  P  and  C.  We 
can  choose  one  at  random  or  seek  funher  context.  The  N  =  3 
JCN  also  would  predict  P  and  C,  so  that  is  no  help.  An  N  =  4 
JCN  would  predict  P;  that  may  be  our  best  bet. 

3.  ENCODING 

Since,  by  hypothesis,  the  symbols  are  countable,  we  can  put 
them  in  one-to-one  correspondence  (order  unimportant)  with 

the  counting  numbers  1,2,3 . Suppose,  for  the  moment, 

that  the  number  of  symbols  is  small,  say  3 1 2,  and  the  JCN  has 
a  context  of  only  N  =  2.  We  use  two  512  X  512  arrays  of 
optical  bistable  devices.  Call  them  A  and  B. 

Figure  I  shows  how  A  and  B  are  addressed  if  the  symbol 
string  is  Cj  |2  C^,.  After  the  context  CU2  Qj33  is  established, 

a  unique  intersection  occurs  in  A.  The  context  C^33  *s  a 
unique  intersection  in  B.  The  next  intersection  occurs  in  A,  etc. 

Thus  odd-numbered  (1st,  3rd, .. .)  concepts  must  be  hori¬ 
zontal  in  A  and  vertical  in  B,  while  the  reverse  is  true  of 
even-numbered  concepts. 

To  address  a  concept,  we  must  know  whether  it  is  odd  or 
even.  We  then  either  turn  on  a  light  beam  or  deflect  a  light 
beam  to  the  proper  position.  The  beam  then  strikes  a  holo¬ 
gram,  which  spreads  it  as  required.  Each  beam  is  on  for  (he 
length  of  the  context  (two  "read  times"  in  this  example).  Each 
beam  has  strength  I,  and  a  threshold  of  around  1.5  detects 


coincidences.  The  transmitted  light  or  (better)  newly  gener¬ 
ated  light  from  a  bistable  laser  strikes  another  hologram, 
which  (a)  gives  the  prediction  or  memory  or  lb)  gives  the  set  or 
weighted  set  of  predictions  or  (c)  sends  data  on  to  other 
processor  arrays. 

Cascading  a  number  of  these  systems  to  use  a  depth  greater 
than  2  is  straightforward. 

4.  EXAMPLE  CASE 

Suppose  we  want  a  512  symbol  N  =  2  JCN.  We  postulate  two 
512X512  arrays  of  bista ble  lasers  that  emit  only  if  struck  with 
light  of  power  at  least  lT.  Each  symbol  can  be  represented  by 
one  of  512  A  sources  or  one  of  512  B  sources.  Each  of  these 
1024  sources  is  followed  by  its  own  hologram,  which  directs 
uniform  vertical  and  horizontal  light  beams  toward  the  A 
and  B  bistable  laser  arrays.  The  power  on  each  illuminated 
laser  is  less  than  IT/  2.  Thus,  only  at  the  intersection  does  a 
laser  beam  arise.  That  laser  beam,  in  turn,  strikes  its  own 
hologram,  which  causes  the  light  to  predict  or  remember 
something  from  the  joint  conjunction  by  illuminating  one  or 
more  of  512  detectors. 

Programming  or  teaching  is  embodied  in  the  two  512X512 
arrays  of  memory  holograms.  If  the  output  (22  X  23  =  506  plus 
an  extra  row  of  6)  is  the  joint  Fourier  transform  of  both  2-D 
laser-hologram  arrays,  the  programming  holograms  are  simply 
properly  aligned  and  spaced  gratings. 

5.  ANALYSIS 

This  paper  is  intended  to  illustrate  a  new  direction  for  optical 
computing:  symbolic  processing.  Optical  parallelism  makes 
speed  independent  of  the  number  of  symbols  processed, 
although  hardware  complexity  does  increase.  The  JCN  speed 
can  be  made  independent  of  N  (the  context  depth)  by  prepar¬ 
ing  some  contexts  while  reading  others.  Again,  the  price  is 
complexity. 
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ABSTRACT 

Uelng  Page  Oriented  Holographic  Meeorlea  (POHMs)  optically  addreaaed  Spatial  Light  Modulators  (SIMa), 
joint  tranafore  correlatora,  20  or  ID  acoueto-optic  cel  la.  and  optically  addreaaable  RAM*  we  can  produce  a 
■aaalvely  parallel  optical  data  baae  aanagenent  ayatea. 

introduction 

Optical  Data  Baae  Manageaent  Systeaa  (DBMS)  operating  with  aaaalve  parallel  read  in  froa  aeaory.  query, 
and  read-out  to  electronlca  would  offer  huge  advantage*  over  the  current  bit-oriented  or  hoped-for  byte- 
oriented  ayatea*  if 

•  the  data  baae  la  too  aaeaive  for  conventional  DBMS  ayatea*. 

•  the  acceaa  tlae  required  le  too  short  for  conventional  DBM*  ayateaa.  or 

•  (preferably  for  optica)  both. 

Mhen  there  1*  need  to  search  huge  data  bases  very  fast,  we  are  autoaatlcally  in  the  big  ayatea  doaaln  which 
will  not  exclude  fairly  coaplex  optical  systeaa.  Therefore,  the  objective  of  this  work  is  to  explore  aaa- 
sively  parallel  optical  DBMS. 


1LJRT  ingredients 

The  key  Ingredient*  are  optical  systeaa  for 

•  aaeaive  parallel  read  into  an  optical  ayatea  froa  a  large  data  store, 

•  parallel  query  on  the  whole  "page"  in  the  optical  systea, 

•  parallel  read  out  froa  the  optical  systea  to  output  electronics,  and 

•  parallel  “Intelligent"  control  of  the  operation.  All  of  tbeae  tasks  can  and  probably  should  be  per- 
foraed  optically. 

For  brevity,  we  deal  here  only  with  the  first  two  tasks.  A  separate  paper  will  dlacuss  the  latter  two. 

m-  parallel  read  m 

Obtaining  whole  pages  of  data  in  parallel  is  the  doaaln  of  page  oriented  holographic  aeaorles  or  POHM 
(1).  Each  page  is  represented  by  its  own  spatially  discrete  hologran  on  a  large  substrate.  Storing  a  256  x 
236  page  requires  a  roughly  one  ailllaeter  holograa.  Me  can  store  10*  of  these  on  a  la  x  la  substrate.  The 
holograae  are  accessed  individually  by  deflecting  a  laser  beaa  to  the  selected  holograa.  Nhlchever  holograa 
is  selected  produces  its  page  at  the  saae  physical  location.  There  It  strikes  an  optically  addressed 
Spatial  Light  Modulator  (SLM)  which  reads  the  full  page  into  the  optical  systea.  This  is  shown  in  Figure  1. 

LAStt 


f’t*  1*  The  deflestse  addressee  s  single  helagrara  free* 
the  POHM  whisk  writes  a  page  ef  dote  la 
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IV. _ PAfULLELJiygiyf 

Th •  first  step  In  parallel  query  la  to  reatrlct  tha  field  of  regard  to  iteaa  of  lntereat.  We  aay  have 
the  data  baae  arranged  In  coluana  auch  aa 
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Our  taafc  la  to  find  ali  people  with  the  aecond  none  -John"  with  54  aa  the  fourth  and  fifth  nuabera  in 
their  SS*  (Social  Security  Nunber).  Using  an  electronically  addreaaed  SLM  we  illuainate  onlv  the  coluen. 
for  aecond  naaea  and  fourth  and  fifth  nuabera  aa  ahown  In  Figure  2.  y 
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We  do  thla  by  teaplate  aatchlng  In  parallel  ualng  a  Joint  tranafora  correlator  aa  shown  In  Figure  3. 
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V.  READOUT  AMP  CONTROL 


We  (ill  only  hint  at  that*  Item*  horn.  Thn  selected  row*  output  aunt  bn  accunulatod  contlnuoualy  on 
parallol  road  la  RAN  for  subaequent  uan.  Figure  4  shown  thn  basic  concept. 

Control  la  the  aoat  difficult  part  If  wa  wish  to  replace  exhaustive  search  with  heuristic  search  in 
later  paper  we  will  shew  an  adaptive  optical  neural  network  suitable  for  this  purpose. 
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Figure  7  OUTPUT  SELECTION 


We  CAN  USE  an  electronically  addressed  SLM 

<SLM3)  TO  WRITE  A  REFERENCE  PATTERN  TO  BE 
ETCHED  WITH  THE  LIGHT  CCMING  FROM  THE 

driven  SLT1  (Sli-U).  Fourier  transform 

LENS  1  JOINTLY  TRANSFORMS  BOTH  OUTPUTS  ONTO  4 
SU14  WERE  THEY  ADD  COHERENTLY.  ANOTHER 
READOUT  BEAM  ADDRESSES  SLM4  WHICH  PRODUCES 
BRIGHT  LIGHT  ON  THE  OUTPUT  PLANE  WHERE  THE 
REFERENCE  PATTERN  IS  MATCHED  WITH  A  PORTION  OF 
THE  INPUT  BEAM  (FROM  SUIT-. 


Figure  8  PATTERN  SaECTION 


F  ourior 


The  20  AO  cell  shifts  tho  imoge  in  any 
direction  by  changing  tho  dir  action  of 
light  in  tho  F ourior  piano* 


Figure  9  IMAGE  SHIFTING 


Optical  database/knowledgebase  machines 


P.  Bruce  Berra,  Karl-Heinz  Brenner,  W.  Thomas  Cathey,  H.  John  Caulfield,  Sing  H.  Lee,  and  Harold  Szu 


In  this  paper  we  discuss  various  aspects  of  databases  and  knowledgebases  and  indicate  how  optics  can  play  an 
important  role  in  the  solution  of  many  of  the  previously  unsolved  problems  in  this  field. 


I.  Introduction 

A  database  is  a  collection  of  interrelated  data  and 
during  the  past  10  years  the  word  database  has  become 
somewhat  of  a  household  word.  This  has  occurred 
because  of  the  ever  increasing  use  of  databases  and  the 
realization  of  their  considerable  influence  on  our  daily 
lives.  They  are  indispensable  to  airlines,  automobile 
companies,  grocery  chains,  department  stores,  hospi¬ 
tals,  colleges  and  universities,  local  and  state  govern¬ 
ments,  and  the  federal  government.  Their  existence  is 
so  important  that  in  many  organizations  the  database 
is  considered  a  resource  as  are  personnel  and  raw  mate¬ 
rials. 

Since  software  database  management  systems 
(DBMS)  often  exhibit  poor  performance,  considerable 
research  has  been  devoted  to  specialized  hardware  de¬ 
vices,  called  database  machines,  to  seek  performance 
improvement.  These  devices  take  advantage  of  the 
significant  advances  in  electronic  hardware  technology 
by  moving  software  functions  into  hardware  and  pos¬ 
sess  considerable  parallelism. 

One  of  the  most  rapidly  growing  fields  of  artificial 
intelligence  (AI)  is  knowledgebase  systems.  A  know¬ 
ledgebase  consists  of  rules  and  facts  about  particular 
domains  of  interest,  and  knowledgebase  management 
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systems  are  concerned  with  inferencing  on  the  know¬ 
ledgebase,  as  well  as  other  functions.  The  most  well- 
known  system  of  this  variety  is  the  expert  system. 
Current  expert  systems  exist  or  are  being  constructed 
in  business,  medicine,  national  defense,  and  engineer¬ 
ing.1  There  has  been  relatively  little  research  directed 
to  the  development  of  knowledgebase  machines.2 

There  is  a  great  deal  of  commonality  between  data¬ 
base  systems  and  knowledgebase  systems.  In  fact, 
there  is  considerable  research  and  development  cur¬ 
rently  going  on  which  is  aimed  at  the  integration  of  the 
two  types  of  system.  One  of  the  results  of  this  integra¬ 
tion  is  the  requirement  for  increased  performance  of 
the  integrated  system  over  either  of  the  individual 
systems.  Database  machines  have  had  as  their  objec¬ 
tive  an  increase  in  the  performance  of  the  database 
system  primarily  in  addressing  problems  that  have  a 
very  large  database  and/or  a  real  time  requirement. 
While  the  performance  of  these  systems  has  been  im¬ 
proved  somewhat,  they  have  not  yielded  the  results 
that  were  desired. 

In  dealing  with  the  types  of  very  large  and/or  real 
time  problem  that  we  are  interested  in,  it  is  natural  to 
look  to  optics  for  possible  solutions.  This  is  due  to  the 
large  storage  capacities  available  through  the  use  of 
optical  media  and  the  inherent  speed  and  parallelism 
of  light.  Thus  we  examine  here  the  potential  perform¬ 
ance  improvements  obtainable  from  optical  database/ 
knowledgebase  machines. 

We  begin  by  considering  database  management,  da¬ 
tabase  machines,  and  knowledgebase  management. 
We  then  present  a  paradigm  for  analyzing  the  poten¬ 
tial  advantages  of  optics.  This  is  followed  by  sections 
on  storage  strategies,  access  strategies,  and  processing 
of  data  prior  to  conversion  to  electronic  form.  Finally, 
we  summarize  our  analyses  and  cite  some  future  direc¬ 
tions  that  hold  considerable  promise. 

II.  Background 

A.  Database  Management 

A  database  management  system  is  a  software  pro¬ 
gram  that  is  concerned  with  the  task  of  controlling  and 
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managing  the  database  as  a  resource  independent  of 
the  computer  hardware  that  hosts  it  and  application 
programs  that  interface  with  it.  The  DBMS  must 
have  the  facility  to  establish  the  database  within  the 
system  in  response  to  the  database  designers.  The 
DBMS  must  make  the  data  available  to  a  wide  variety 
of  users  ranging  from  external  application  programs  to 
a  casual  user  posing  a  particular  query.  Inevitably, 
the  database  must  be  updated.  That  is,  new  data  must 
be  added,  old  data  must  be  deleted,  and  existing  data 
must  be  changed.  Thus,  the  DBMS  must  also  have 
the  capability  for  performing  these  updates.  In  fact, 
many  databases  (e.g.,  airline  databases)  have  almost  as 
much  update  activity  as  query  activity.  Of  course 
there  are  many  types  of  database  that  have  limited  or 
controlled  update  activity  (e.g.,  various  forms  of  text 
databases). 

The  DBMS  must  provide  the  facility  for  insuring  the 
integrity  of  the  database.  This  is  obtained  through 
various  consistency  checks  and  backup  and  recovery 
systems.  Finally,  the  DBMS  must  regulate  access  to 
the  database  to  protect  it,  the  system  itself,  and  the 
privacy  of  users. 

It  is  not  surprising  that  DBMSs  which  furnish  all  of 
this  functionality  tend  to  be  expensive  and  require 
considerable  computing  resources  to  be  effective. 
While  it  is  true  that  one  can  purchase  DBMS  for  per¬ 
sonal  computers,  these  systems  do  not  possess  all  the 
functionality  discussed  above  and  are  therefore  not  the 
focus  of  this  paper.  Rather,  we  are  concerned  with 
systems  that  must  deal  with  very  large  databases 
(VLDB)  (hundreds  of  gigabytes)  and/or  have  a  r_<u 
time  requirement  (1  s  or  less  response  time). 

Since  the  DBMS  is  just  another  application  pro¬ 
gram,  albeit  with  considerable  subprograms,  it  must 
adhere  to  normal  execution  procedures  just  as  other 
programs.  The  database  user  (a  human  user  or  aopli 
cation  program)  interacts  with  the  DBMS  through  a 
query  language  (or  other  language)  to  accomplish  a 
task.  The  DBMS  must  interact  with  the  operating 
system  to  obtain  data  from  the  database  which  is 
stored  on  the  computing  system’s  secondary  memory. 
Since  the  operating  system  must  satisfy  a  large  number 
of  types  of  user,  the  size  of  the  block  of  data  retrieved 
from  disk  is  optimized  for  all  users  and  is  thus  fixed. 
The  block  of  data  is  placed  in  main  memory  and  turned 
over  to  the  DBMS  which  sifts  through  it  to  find  what  it 
wants.  There  may  be  little  data  of  interest  to  the 
DBMS  due  to  the  organization  of  the  data  and  type  of 
query.  Thus,  the  DBMS  may  have  to  ask  the  operat¬ 
ing  system  for  many  pages  of  data  to  satisfy  a  query. 
This  repeated  access  to  secondary  storage  consider¬ 
ably  degrades  the  performance  of  the  DBMS  since  the 
access  time  to  the  disk  is  about  one  million  times 
slower  than  access  to  main  memory.  This  disparity  is 
called  the  access  time  gap. 

B.  Database  Machines 

A  typical  structure  for  a  database  machine  is  that  of 
a  frontend-backend  system.  That  is,  the  user  inter¬ 
acts  with  a  sequential  computer  host  which  transforms 


the  request  into  a  series  of  commands  that  can  be 
executed  by  the  backend  database  machine.  The  da¬ 
tabase  machine  handles  all  database  functions  and 
returns  results  to  the  host  which  then  passes  them  on 
to  the  user.  There  are  many  advantages  of  this  config¬ 
uration  including  removal  of  dependence  on  the  oper¬ 
ating  system,  reduction  in  the  number  of  functions 
performed  (i.e.,  the  database  machine  only  executes 
database  functions),  optimization  of  execution  of  cer¬ 
tain  functions  (i.e.,  special  hardware  for  relational  op¬ 
erations),  mitigation  of  the  access  time  gap  problem 
through  parallel  access  to  multiple  disks  and,  in  gener¬ 
al,  the  advantages  one  has  in  solving  a  more  narrowly 
defined  problem. 

There  are  also  disadvantages  to  this  configuration. 
These  systems  tend  to  be  more  costly  and  less  available 
than  sequential  DBMSs.  There  are  dozens  of  univer¬ 
sity  and  industry  database  machine  projects  but  there 
are  just  a  few  commercially  available  products.'  4 
However,  there  are  hundreds  of  sequential  DBMSs.  If 
the  problem  being  addressed  is  basically  sequential,  no 
amount  of  parallelism  will  help;  in  fact  it  may  even 
degrade  performance  beyond  that  of  a  sequential 
DBMS.  For  example,  if  a  query  consists  of  several 
subqueries  each  of  which  depends  on  the  result  of  the 
previous  subquery,  the  traffic  across  the  host-data¬ 
base  machine  interface  will  significantly  degrade  the 
performance  of  the  system. 

As  was  previously  pointed  out,  we  are  concerned 
with  VLDB  and/or  real  time  requirements;  database 
machines  also  address  these  requirements.  Thus, 
while  there  may  be  hundreds  of  sequential  DBMSs 
available,  only  the  few  of  them  residing  on  major  main¬ 
frame  computers  are  able  to  address  the  requirements. 
The  comparison  then  comes  down  to  large  mainframe 
systems  with  DBMSs  vs  database  machines. 

A  problem  addressed  by  most  database  machine 
designs  is  that  of  parallel  access  to  magnetic  disk. 
Their  approaches  are  only  partially  successful  since 
the  difficulty  basically  lies  with  the  mechanics  of  the 
disk.  The  speed  of  rotation  and  the  extremely  small 
distance  between  the  read-write  head  and  the  disk 
surface  are  such  that  the  sustained  transfer  rate  of 
large  commercially  available  magnetic  disks  tops  out 
at  ~3  Mbytes/s.  The  exception  is  multiactuator,  mul¬ 
tihead  disks  which  can  increase  this  rate  but  at  a  con¬ 
siderably  greater  cost.  Because  of  this  current  limita¬ 
tion,  database  machine  architects  have  designed  their 
systems  to  be  able  to  accept  and  process  data  from 
magnetic  disks  at  these  rates.  However,  if  data  rates 
were  available  at  300  Mbytes/s,  these  database  ma¬ 
chines  would  have  considerable  difficulty  dealing  with 
the  situation.  They  would  become  compute  bound 
rather  than  I/O  bound  as  is  currently  the  case.  We  will 
return  to  this  point  later  in  the  paper. 

C.  Applications  Requirements 

In  examining  data  processing  in  general  and  data¬ 
base  processing  specifically  for  their  applicability  to 
both  near  and  far  term  applications,  one  is  faced  with  a 
varied  and  dynamic  set  of  both  operational  problems 
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as  well  as  technological  solutions.  Rather  than  per¬ 
forming  the  analysis  on  each  of  these  problem/solu- 
tion/timeframe  tuples,  it  is  desirable  to  develop  a  ge¬ 
neric  analysis  technique.  Specifically,  the  technique 
transforms  the  system  requirements  onto  three  inde¬ 
pendent  axes:  database  size,  bandwidth  of  communi¬ 
cations  to  the  disks,  and  processing  rate  as  shown  in 
***• 1.  The  size  of  the  database  is  shown  on  the  verti¬ 
cal  axis  and  is  measured  in  bytes.  The  bandwidth  is 
measured  in  megabytes  per  second  and  provides  a 
measure  of  what  is  required  to  access  disk  storage  in 
aolvmg  a  variety  of  problems.  Another  measure  of 
bandwidth  is  the  amount  of  query  input  data  to  the 
85?.te°1-  However,  this  input  communication  band- 
width  results  in  considerable  accesses  to  the  disks  as 
w*i  88  increased  processing  requirements.  Thus  we 
*ml  use  bandwidth  to  refer  to  internal  I/O  bandwidth. 

a  third  axis  represents  processing  rate.  In  mul- 
uiser  environments  a  large  query  load  generally  re- 
ah  m  *  k**®  processing  load.  However,  relatively 
ort  complex  queries  can  also  result  in  large  process- 
“j*  loads.  But,  in  general,  database  management 
P  A**  nJore  stress  on  I/O  rather  than  on  processing. 

As  a  first  example,  suppose  a  sensor  has  measured 


the 


signature  of  a  radar  and  one  must  match  this 


signature  against  a  library  of  known  signatures  in  order 
to  take  an  appropriate  response.  Suppose  that  there 
are  10,000  known  radar  emitter  signatures,  and  a  new 
emitter  appears  every  100  s.  The  required  processing 
time  is  not  related  to  the  input  rate.  However,  this 
application  may  require  a  match  in  10  ms  to  make  a 
timely  response.  While  neither  the  measured  signa¬ 
ture,  nor  the  signature  database  has  100%  validity,  we 
would  like  the  process  to  be  error  free.  Typically,  the 
measured  signature  will  have  certain  parameters  that 
are  correct  and  some  that  are  in  error.  In  a  specific 
report,  the  degree  of  error  is  uncertain  and  the  match¬ 
ing  progress  must  allow  partial/probalistic  matches. 
Thus,  a  single  input  may  result  in  many  internal  pro¬ 
cesses.  This  application  is  depicted  as  P\  in  Fig.  1. 

As  a  second  example,  a  contact  comparison  applica¬ 
tion  might  have  a  database  of  the  order  of  109  bytes,  a 
contact  report  rate  of  1/s,  and  a  match  requirement  of 
2/s.  This  application  is  depicted  as  Pi  in  Fig.  1. 

As  a  third  example,  a  monopulse  signal  sorter  takes  a 
set  of  measured  parameters  on  a  single  radar  pulse  and 
attempts  to  identify  the  source  of  the  pulse  from  a 
database  of  emitters  known  to  be  currently  hearable. 
In  a  complex  environment,  there  may  be  1000  emitters 
hearable  at  one  site,  each  emitting  1000  pulses/s.  The 
expected  report  rate  is  then  106  pulses/s.  The  current 
database  consists  of  the  1000  emitters.  The  time  to 
locate  a  match  must  be  <1  jis  just  to  keep  up  with  the 
input  rate.  The  front  end  sensor  may  produce  errone¬ 
ous  results,  for  example,  when  pulses  from  two  differ¬ 
ent  emitters  are  overlapping  in  time  and  may  miss 
pulses  that  are  too  weak  to  be  detected.  This  will 
create  two  problems.  First,  there  will  be  holes  in  the 
database  which  will  not  be  filled  in  without  additional 
manipulation  of  the  database;  and  second,  there  will  be 
residual  reports  that  do  not  correspond  to  any  real 
emitter.  This  application  is  depicted  as  P3  in  Fig.  1. 

If  one  had  an  infinitely  fast  serial  machine  with,  say, 
100  Gbytes  of  memory,  one  could  solve  all  the  example 
problems,  but  not  necessarily  in  a  cost-effective  man¬ 
ner.  Similarly,  a  very  large  content  addressable  mem¬ 
ory  could  be  used  to  solve  all  the  problems  but  would 
be  a  gross  overkill  of  certainly  the  first  example.  The 
challenge,  then,  is  to  develop  techniques  to  handle  the 
above  range  of  examples,  which  are  generic  functions 
of  the  application  and  the  state  of  technology,  to  invest 
resources  in  the  minimum  number  of  architectures  to 
solve  the  collective  database  processing  tasks. 

An  important  consideration  is  the  cost  of  various 
technological  alternatives.  For  example,  one  can  pur¬ 
chase  off-the-shelf  chips  at  about  $100/Mbyte  and 
disk  memory  at  about  $50/Mbyte  based  on  commer¬ 
cially  available  256k  RAM  chips  and  100  Mbyte  disks. 
The  advantages  and  disadvantages  of  choosing  chip  or 
disk  with  regard  to  database  size,  I/O  communication, 
and  processing  rate  can  be  argued  in  many  ways  de¬ 
pending  on  the  task  at  hand.  RAM  memory  has  a  very 
fast  access  time  (200  ns)  but  is  volatile  while  disk 
memory  is  low  (30  ms  access  time)  and  nonvolatile.  In 
contemporary  systems,  RAM  appears  in  limited  quan¬ 
tity  while  disks  appear  in  large  quantity.  But,  with  the 
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relatively  recent  reduction  in  the  cost  of  RAM,  this 
proportion  is  expected  to  shift  dramatically  in  favor  of 
RAM. 

Returning  to  Fig.  1,  no  single  architecture  is  best  for 
all  the  tasks  at  hand.  Specifically  for  database  prob¬ 
lems,  it  depends  on  at  least  the  three  characteristics 
given  on  the  axes.  For  example,  with  Pi  a  moderate 
amount  of  data  with  little  need  for  bandwidth  or  pro¬ 
cessing  describes  the  problem.  In  this  case,  there  is 
little  requirement  for  special  architectures  for  process¬ 
ing  the  data  or  increasing  the  bandwidth  to  disk. 

With  P2,  there  is  a  need  for  the  storage  of  large 
amounts  of  data,  so  considerable  disk  storage  is  re¬ 
quired.  However,  there  are  also  moderate  require¬ 
ments  for  bandwidth  to  disk  as  well  as  processing 
power.  In  this  situation,  parallel  computer  architec¬ 
tures  may  be  employed  with  some  usefulness.  W e  may 
employ  a  variety  of  such  architectures  from  single  in¬ 
struction-multiple  data  stream  to  multiple  instruc¬ 
tion-multiple  data  stream.  The  point  here  is  that  we 
must  now  look  to  parallel  architectures  to  keep  up  with 
the  processing  load.  Also,  we  must  be  able  to  access 
considerable  amounts  of  data  so  the  bandwidth  to 
secondary  storage  must  be  high.  This  will  require 
parallel  access  to  disks  perhaps  along  the  lines  of  com¬ 
mercially  available  database  machines. 

The  third  problem,  P3,  is  much  more  difficult  to  deal 
with  since  it  has  such  stringent  requirements  on  all 
dimensions.  Optical  storage  may  help  with  this  prob¬ 
lem  due  to  its  high  density,  but  data  will  have  to  be 
retrieved  from  the  disk  at  a  faster  rate.  Optical  inter¬ 
connects  will  definitely  help  because  of  high  band¬ 
width,  and  optical  or  electrooptical  processing  may 
offer  some  solutions  in  the  future. 

D.  Knowledgebase  Management 

Knowledgebase  systems  are  composed  of  a  knowled¬ 
gebase  of  rules  and  facts  and  an  inferencing  mecha¬ 
nism  that  is  used  to  respond  to  queries  using  the  exist¬ 
ing  knowledgebase.  In  the  case  of  expert  systems,  the 
objective  is  to  capture  the  knowledge  of  experts  in 
particular  domains  and  make  it  generally  available  to 
nonexperts.  Various  knowledge  structuring  tech¬ 
niques  include  semantic  networks,  production  rules, 
logic,  and  frames  with  the  LISP  and  Prolog  languages 
in  common  use.  Current  expert  systems  tend  to  focus 
on  narrow  domains,  have  small  knowledgebases  and, 
therefore,  have  limited  application.  As  these  systems 
expand  and  more  general  applications  are  considered, 
increasing  demands  will  be  placed  on  the  management 
of  the  knowledgebase.  The  database  of  rules  (called 
the  intensional  database)  will  become  large  but  the 
major  management  problem  will  be  in  the  access,  up¬ 
date,  and  control  of  the  database  of  facts  (called  the 
extensional  database). 

The  above  considerations  have  led  to  many  research 
efforts  aimed  at  the  interface  and  eventual  integration 
of  knowledgebase  systems  and  database  systems. 
Some  systems  are  currently  available  that  provide  an 
interface  between  a  knowledgebase  system  and  the 
DBMS.  While  this  allows  for  the  management  of  larg- 
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Fig.  2.  Optical/electronic  paradigm. 


er  knowledgebases,  the  performance  of  such  systems  is 
less  than  desirable  because  of  the  slow  interface  and 
the  duplication  of  functionality  of  the  two  systems. 
Another  approach  that  is  being  taken  is  to  extend  the 
capabilities  of  the  knowledgebase  system  through  the 
addition  of  secondary  storage  management.  Still  an¬ 
other  approach  is  the  addition  of  inferencing  function¬ 
ality  to  existing  DBMS.  All  these  approaches  are 
headed  in  the  direction  of  an  integrated  knowledge¬ 
base  management  system  (KBMS)  that  possesses  the 
capabilities  of  both  systems.  However,  when  viewed 
from  a  performance  perspective,  KBMSs  will  place 
even  more  demands  on  the  underlying  data  manage¬ 
ment  structures.  That  is  why  it  is  imperative  that  we 
look  to  other  technologies  such  as  optics  for  possible 
solutions. 


III.  Hierarchical  Structure  ot  Processing 

The  state  of  the  art  of  electronic  computing  enjoys 
considerable  maturity.  In  contrast,  optics  as  applied 
to  digital  computing  is  very  young  and  has  yet  to  make 
its  mark.  In  assessing  how  optics  may  help  database 
and  knowledgebase  management,  it  seems  clear  that 
the  most  impact  will  be  felt  at  the  lowest  level.  Thus, 
the  approach  that  we  have  taken  in  this  paper  is  an 
optoelectronic  one  in  which  we  start  at  the  very  lowest 
level  and  progressively  move  toward  conversion  to 
electronics  as  indicated  in  Fig.  2.  We  examine  various 
types  of  optical  storage  media  and  devices  to  assess 
their  potential  for  use  in  database  and  knowledgebase 
management.  As  will  be  discussed  later,  the  potential 
exists  for  enormous  data  rates  from  optical  storage. 
Since  electronic  database  machines  are  designed  to 
deal  with  magnetic  disk  transfer  rates,  they  will  not  be 
able  to  handle  these  increased  rates.  This  dictates 
that  we  keep  the  data  in  optical  form  and  do  as  much 
processing  as  we  can  prior  to  converting  to  electronics. 
We  will  discuss  the  type  of  processing  that  can  be  done 
later  in  the  paper.  However,  our  objective  is  to  process 
the  optical  data  to  the  fullest  extent  possible  so  that,  on 
conversion,  the  data  rate  will  be  within  the  capabilities 
of  the  electronic  computer  but  more  content  rich.  In 
this  way  we  hope  to  increase  the  performance  of  the 
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system  without  disturbing  the  large  investment  in  sys¬ 
tem  and  user  software. 


IV.  Storage  and  Processing  Strategy 

A.  Optica)  Disks 

By  far  the  most  popular  form  of  optical  storage  is 
optical  disks.  These  range  from  CD-ROMs  to  very 
large  disk  units  which  allow  for  massive  storage  of 
data.56  Optical  disks  have  a  far  greater  capacity  than 
their  magnetic  counterpart  but  have  a  much  slower 
access  and  transfer  rate.  This  is  primarily  due  to  the 
mass  of  the  read  head  and  the  slower  revolution  rate. 
However,  while  magnetic  disks  appear  to  be  approach¬ 
ing  technological  limits  with  regard  to  access  time  and 
transfer  rates,  optical  disks  have  great  potential  for 
vast  improvements.  This  is  true  primarily  because  of 
the  relatively  large  distances  between  the  read  mecha¬ 
nism  and  the  disk  surface.  Through  multibeam  read¬ 
ing  the  potential  exists  for  massive  data  transfer  rates 
of  the  order  of  300-500  Mbytes/s,  a  full  2  orders  of 
magnitude  over  current  magnetic  disks. 


B.  Content  Addressable  Memory 

There  have  been  several  laboratory  demonstrations 
of  associative  memory.7-9  Variations  of  this  approach 
may  be  applicable  for  a  content  addressable  memory. 

One  technique,  using  holographic  memory,  is  to 
store  the  data  holographically  and  to  provide  feedback 
with  gain.  This  system,  illustrated  schematically  in 
Fig.  3,  operates  as  follows:  The  combination  hologram 
and  resonant  structure  has  transverse  resonant  modes 
that  are  defined  and  limited  by  the  images  stored  in  the 
hologram;  that  is,  only  resonant  modes  corresponding 
to  a  holographic  image  are  possible.  A  partied  image  is 
fed  into  the  hologram,  which  causes  a  more  complete 
jmage  to  be  reconstructed  from  the  hologram.  This 
image  receives  gain  in  the  nonlinear  medium,  and  the 
resonant  structure  resonates  with  that  image.  If  a 
Portion  of  another  image  is  input  into  the  system,  the 
transverse  mode  associated  with  that  image  becomes 
dominant,  and  after  a  few  passes  around  the  closed 
path  with  gain,  that  image  is  fully  recalled.  If  more 
than  one  stored  image  contains  the  image  portion  that 
*s  fed  into  the  system,  one  image  will  start  to  dominate 
due  to  greater  correlation  with  the  input  or  the  charac¬ 
teristics  of  the  noise  in  the  system  and,  once  the  system 
locks  onto  that  mode  (image),  it  stays  on  that  image. 

In  a  content  addressable  memory,  it  is  desired  that 

“ts  of  data  with  a  common  part  be  retrieved.  For 
example,  if  the  word  Colorado  is  input  to  a  database  of 
Optical  Society  members,  it  should  be  possible  to  re¬ 
lieve  all  appropriate  names  of  members  in  Colorado 
either  in  series  or  parallel.  In  a  database/knowledge- 
hase  system,  it  is  necessary  that  a  partial  input  into  a 
‘e  retrieve  all  components  of  that  file  having  that 
Partial  input.  In  currently  demonstrated  associative 
Memories,  only  one  component  would  be  retrieved  and 
the  one  retrieved  would  differ  from  time  to  time  de- 


Fig.  3.  Content  addressable  memory  with  feedback  and  holograph¬ 
ic  storage.7 


pending  on  the  noise  state  of  the  system.  To  change 
the  associative  memory  into  a  content  addressable 
memory  suitable  for  database/knowledgebase  systems 
would  require  some  means  of  recalling  them  all.  A 
perturbation  of  the  system  would  be  necessary  to  move 
the  resonant  system  to  another  transverse  mode  and 
another  output  in  the  common  file.  Thus  far,  no  such 
demonstration  has  been  made. 

Another  unresolved  problem  with  holographic  asso¬ 
ciative  memory  (or  any  holographic  storage  system,  for 
that  matter)  is  that  the  possible  number  of  stored 
images  predicted  by  current  theory  is  several  orders  of 
magnitude  greater  than  has  been  achieved  experimen¬ 
tally.  More  accurate  analyses  and  simulations  are 
needed  before  these  discrepancies  can  be  resolved. 

C.  Page  Oriented  Holograms 
1.  Storage 

Most  massive  database  and  knowledgebase  systems 
store  data  on  magnetic  or  optical  disks  and  employ 
indexing  techniques  to  avoid  or  minimize  disk  access¬ 
es.  Various  clustering  and  accessing  techniques  are 
used  to  reduce  response  time.  Even  so,  when  the  joint 
requirements  of  very  large  databases  and  very  short 
response  times  are  imposed,  existing  technologies  de¬ 
grade  considerably.  In  these  cases,  the  ability  to  call 
forth  and  operate  on  large  pages  of  data  in  parallel 
would  offer  a  profound  advantage  over  serial  opera¬ 
tion.  Some  of  the  issues  discussed  below  are  also 
addressed  in  Ref.  10. 

The  basic  concept  of  page-oriented  holographic 
memory  (POHM)  is  quite  simple.  Many  small  spa¬ 
tially  discrete  holograms  are  recorded  on  a  single  sub¬ 
strate.  Some  are  constructed  such  that  whenever  a 
laser  beam  is  deflected  to  one  of  these  holograms,  the 
output  2-D  image  falls  on  a  common  surface  for  all 
holograms.  Of  course,  the  whole  2-D  image  arrives 
essentially  in  parallel.  A  1-mm  hologram,  properly 
made,  can  store  an  array  of  lOMO6  bits  which  is  a  page. 
An  electrooptic  or  acoustooptic  deflector  can  address 
any  of  these  stored  pages  very  rapidly  (10_4-10~6  s). 
Access  tme  is  limited  by  iaser  deflection  times  (10“9- 
10-6  s)  or  paallel  readout  mechanism  response  time 
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[lO'MCT2  s  for  currently  available  spatial  light  modu¬ 
lators  (SLMs)|.  Using  the  worst  (best)  case  numbers, 
we  can  recall  104  (10s)  bit  pages  in  any  order  from 
among  104  (106)  such  pages  at  a  rate  of  100  ( 104)  pages/ 
s.  Thus  the  capacity  of  POHM  ranges  from  a  few 
megabytes  to  over  a  terabyte  while  the  transfer  rate 
ranges  from  <1  Mbyte/s  or  >100  Gbytes/s.  We  can 
place  an  optically  addressed  SLM  at  the  output  as  an 
image  amplifier  to  read  the  page  into  the  optical  sys¬ 
tem  in  parallel.  Figure  4  shows  this  basic  system  along 
with  the  laser  readout  system  for  the  SLM. 

What  the  SLM  produces  is  a  modulation  pattern  (in 
intensity,  phase,  polarization,  etc.)  but  light  comes 
only  from  the  portion  of  the  SLM  which  is  illuminated. 
Thus  if  we  illuminate  the  SLM  addressed  by  the 
POHM  with  light  from  a  second  SLM  (electrically 
addressed,  intensity  modulated),  we  can  restrict  entry 
into  the  optical  system  to  those  portions  of  the  page  of 
immediate  interest.  Figure  5  shows  this  part  of  the 
system.  For  read-only  POHMs,  photographic  or  oth¬ 
er  conventional  storage  methods  can  be  used. 

Multiplexed  holograms  can  also  be  stored  in  3-D 
photorefractive  crystals.1112  Two  schematics  of  pho- 
torefractive  memory  are  shown  in  Fig.  6.  In  Fig.  6(a), 
the  ith  image  is  stored  by  interfering  the  input  image 
with  the  reference  (pump)  beam  when  the  photore- 
fractive  crystal  is  rotated  to  a  specific  angular  position. 
To  read  out  the  ith  image  from  the  photorefractive 
memory,  the  input  is  turned  off  and  the  reference 
beam  turned  on  when  the  crystal  is  rotated  to  the 
specific  angular  position.  In  Fig.  6(b),  the  ith  image  is 
stored  by  interfering  the  input  with  a  reference  beam 
of  the  ith  phase  code.  (The  photorefractive  crystal 
need  not  be  rotated  in  this  alternate  scheme . )  To  read 
out  the  ith  image  from  the  photorefractive  memory, 
the  input  is  turned  off  and  the  reference  beam  of  the 
ith  phase  code  is  turned  on. 

Presently,  photorefractive  crystals  require  millisec¬ 
onds  to  store  a  hologram.  The  hologram  writing  time 
can  be  reduced  by  using  higher  intensity  beams.  It  can 
also  be  reduced  for  strontium  barium  “iobate  (SBN) 
by  applying  an  electric  field  across  the  crystal.  Re¬ 
search  for  reducing  the  hologram  writing  time  is  under 
way  by  increasing  impurity  doping  levels  in  the  pho¬ 
torefractive  crystals.  To  retrieve  a  stored  image  from 
a  photorefractive  memory,  the  time  required  can  be 
much  shorter  than  milliseconds  and  is  determined  by 
how  fast  the  photorefractive  crystal  can  be  rotated  to 
the  desired  angular  position  in  scheme  (a)  or  how  fast 
the  reference  beam  can  be  switched  from  one  phase 
code  to  another  in  scheme  (b).  Using  SBN:60,  hun¬ 
dreds  of  page-oriented  holograms  can  be  stored  and 
retrieved  in  real  time. 

Holographic  storage  is  far  from  perfected  despite 
many  millions  of  dollars  of  effort  expended  all  over  the 
world  in  the  1970s.  Uniformity  among  output  pixels  is 
seldom  better  than  10-15%,  signal-to- noise  ratios  can 
be  low,  but,  outside  the  Soviet  Union,  little  work  has 
been  performed  on  POHMs  this  decade.  Great  im¬ 
provements  arising  from  subsequent  advances  in  ho¬ 
lography  and  SLMs  may  be  expected. 
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Fig.  4.  Page  selection  and  readout.  The  deflector  addresses 
single  hologram  on  the  POHM  and  a  page  of  data  is  written 
parallel  onto  the  output  laser  beam. 
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Fig.  5.  Output  selection.  A  second  SLM  (SLM2.  shown  here  j 
transmissive)  is  used  for  selective  illumination  of  the  output  trot 
SLMI. 
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Fig.  6.  Photorefractive  memory:  (a)  multiple  holograms 
stored  and  retrieved  as  a  function  of  crystal  angular  positions  and 
multiple  holograms  are  stored  and  retrieved  as  a  function  of  ph. 
coded  reference  beams. 
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Fig.  7.  Pattern  selection.  We  can  use  electronically  addressed 
SLM3  to  write  a  reference  pattern  to  be  matched  with  the  light 
coming  from  POHM-driven  SLMl.  Fourier  transform  lens  1  jointly 
transforms  both  outputs  onto  SLM4  where  they  add  coherently. 
Another  readout  beam  addresses  SLM4  which  produces  bright  light 
on  the  output  plane  where  the  reference  pattern  is  matched  with  a 
portion  of  the  input  beam  from  SLMl. 


2.  Processing 

It  will  be  of  interest  to  search  the  illuminated  por¬ 
tion  of  the  retrieved  page  in  parallel  for  space-invari¬ 
ant  pattern  recognition.  Fourier  optics  is  known  to  be 
excellent  for  this  purpose  if  we  know  ahead  of  time 
what  pattern(s)  we  want  to  recognize  and  prepare  ap¬ 
propriate  pattern  recognition  masks.  While  this  tech¬ 
nique  will  be  useful  in  certain  cases,  it  offers  insuffi¬ 
cient  flexibility  for  general  DB/KB  purposes,  so  we 
must  use  other  techniques  such  as  joint  transform 
correlation.11"15  To  do  joint  transform  correlation,  we 
generate  a  reference  pattern  on  yet  another  SLM  and 
use  one  lens  to  jointly  Fourier  transform  both  images 
which  must  be  illuminated  by  the  same  laser  in  such  a 
manner  that  they  are  mutually  coherent  in  the  Fourier 
plane.  There  they  strike  yet  another  SLM  which  is 
read  out  by  yet  another  laser  beam.  That  laser  beam, 
after  reflection  from  the  SLM,  is  again  Fourier  trans¬ 
formed  to  produce  an  ouptut  which  resembles  the  in¬ 
put  page  but  is  bright  only  where  the  reference  pattern 
appears  in  the  page.  This  output  pattern  must  be 
thresholded  optically  (in  parallel)  or  electronically. 
Figure  7  shows  this  arrangement. 

Finally,  data  must  be  copied  from  the  page  onto  a 
scratchpad  memory.  Where  possible,  this  too  should 
occur  in  parallel.  If  we  assume  that  we  can  accumulate 
data  over  time  in  parallel  on  a  2-D  charge-coupled 
devj^e  (CCD)  array  for  eventual  CCD  readout  the 
problem  becomes  one  of  illuminating  only  the  right 
Part  of  the  page  (a  problem  discussed  earlier)  and 
■fleeting  that  light  to  the  right  part  of  the  detector 
~ra^'  "  fast  (microsecond)  2-D  acoustooptic  image 
wanner  should  be  ideal  for  this  task.  Figure  8  shows 

ow  this  variable  spacing  grating  can  be  used  in  this 
manner. 

Omitted  from  this  discussion  of  basic  methods  are 
■wings  of  how  all  the  parts  fit  together  in  one  system. 
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Fig.  8.  Image  shifting.  The  2-D  AO  cell  shifts  the  image  in  any 
direction  by  changing  the  direction  of  light  in  the  Fourier  plane. 


We  believe  that  this  will  be  complicated  but  perfectly 
feasible  using  beam  splitters  (amplitude  and  polariza¬ 
tion),  reflex  mirrors,  multiple  POHMs,  etc.  The  sys¬ 
tem,  at  least  initially,  will  be  large  and  expensive,  but 
the  users  of  very  large  DB/KB  systems  are  used  to  size 
and  cost  now.  What  optics  adds  is  high  speed. 

3.  Two-Dimensional  Access 

It  is  possible,  in  principle,  to  move  both  the  medium 
and  the  beam  in  such  a  way  as  to  use  the  fastest 
available  1-D  scanners,  «\g.,  chirped  acoustooptic  cells, 
with  much  slower  medium  translators  to  interrogate  a 
2-D  data  array  at  a  bit  rate  approaching  that  of  the  fast 
scanner.  Suppose  an  IV-bit  fast-scan  horizontal  pat¬ 
tern  is  available.  Let  the  medium  move  vertically  at  a 
speed  Sm.  Let  the  fast  acoustooptic  scan  speed  be  Sa 
»  Sm.  Then,  by  tilting  the  scan  direction  at  an  angle 

8S  -  tan-'(SJS„)  SJSa  «  1 

from  the  horizontal  we  can  sweep  out  a  horizontal  path 
at  speed  Sa.  By  making  rows  correspond  to  attributes 
or  to  objects  in  a  relational  database,  this  method  could 
allow  up  to  gigahertz  access  to  the  interesting  part  of  a 
database. 

By  far  the  fastest  access  to  a  random  bit  and  the 
most  bits  read  out  in  parallel  result  from  page-oriented 
holographic  memories  discussed  earlier 16  and  shown  in 
Fig.  4.  Let  there  be  H 2  holograms  (H  X  H  array)  each 
presenting  a  B  x  B  bit  array  to  the  SLM  when  illumi¬ 
nated.  The  maximum  deflection  time  is  tp.  Clearly, 
any  particular  bit  from  the  fPB2  number  of  stored  bits 
can  enter  the  optical  system  in  an  access  time 

T  *  maxltp.r,), 

where  t,  is  the  SLM  response  time.  Probably,  T  =  ts 
can  now  be  10-6  s.  For  H  *  B  *  103  (a  very  large 
POHM  since  the  individual  holograms  must  be  1-2 
mm),  we  have  1012  bits  accessible  in  10~6  s  or  1018  bits/ 
s.  Even  if  we  immediately  convert  the  data  to  serial 
format,  we  still  have  access  at  1015  bits/s.  Neverthe¬ 
less,  the  best  course,  if  feasible,  is  to  keep  the  page 
operations  parallel  and,  hence,  optical  for  as  long  as 
possible. 

Another  approach  is  to  select  one  of  an  array  of 
holograms  as  before  but  allow  each  hologram  to  store 
multiple  images.17  If  the  images  are  angularly  multi¬ 
plexed,  a  2-D  acoustooptic  cell  at  or  imaged  onto  the 
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hologTam  can  allow  selection  of  the  desired  wavefront. 
If  the  images  are  wavelength  multiplexed  we  must 
adjust  the  deflector  and  tunable  source  jointly.  A 
hologram  storing  N  images  reduces  the  POHM  area  by 

N.  However,  multiplexed  holograms  experience  re¬ 
duced  signal-to-noise  ratio  or  dynamic  range. 

O.  Spatial  Light  Modulators 

There  are  presently  several  spatial  light  modulators 
(SLMs)  which  exhibit  optical  memory  characteristics 
and  may  be  considered  for  page-oriented  memory  ap¬ 
plications.  They  are  microchannel  spatial  light  modu¬ 
lators,18,19  ferroelectric  liquid  crystals,20*23  multiple 
quantum  wells,24-27  silicon-electrooptic  modulators,28 
and  thermoplastics.29 

1.  SLM  Storage 

Fast,  high  density  reprogrammable  electronic  mem¬ 
ories  are  widely  available.  The  electronic  memory  can 
be  divided  into  N(n  X  n)  cells  in  an  array  format.  If 
optical  input  and  outputs  in  the  form  of  phototransis- 
tors  and  optical  modulators,  respectively,  can  be  added 
to  each  of  the  IV  cells  of  the  memory  array,  we  can 
obtain  an  optically  accessible  iV-port  memory  SLM 
where  N  is  the  number  of  optical  input-output  ports. 
This  memory  will  be  page  oriented  because  ail  N-  ports 
arranged  in  the  array  format  can  be  accessed  in  paral¬ 
lel.  The  number  of  memory  circuits  in  each  cell  will  be 
the  depth  of  the  IV- port  memory. 

To  provide  the  optical  inputs  and  outputs  for  the  2V- 
port  memory,  research  on  phototransistor  design  and 
silicon-electrooptic  material  integration  has  been  per¬ 
formed.28,30-32  With  PLZT  as  the  modulator  material, 
it  is  estimated  that  memory  access  time  of  1  fis  is 
attainable.  Low  loss  polarization  switching  at  micro¬ 
second  rates  has  been  demonstrated  with  ferroelectric 
liquid  (FLC)  crystals20  and  photo-addressed  FLC 
SLMs  have  been  demonstrated.23  New  electrooptic 
materials  such  as  organic  polymers  and  GaAlAs  or  InP 
multiple  quantum  well  structures26,27  are  currently  be¬ 
ing  studied  for  access  time  improvements. 

Depending  on  how  the  electronic  memory  in  each 
cell  is  organized,  the  N-port  memory  can  be  accessed 
by  address  or  by  content.  Depending  on  which  and 
how  many  cells  of  the  iV-port  memory  are  activated, 
pages  of  information  can  be  retrieved  in  parts  or  in 
their  entirety. 

2.  Processing 

It  is  also  possible  to  perform  logic  using  the  SLM.33-36 
If  many  (or  some)  of  the  memory  circuits  in  each  cell  of 
the  N-  port  memory  SLMs  are  replaced  by  logic  cir¬ 
cuits,  we  obtain  an  (V-port  processing  SLM,  which  can 
combine  the  processing  power  of  silicon  electronics 
and  the  communication  or  interconnection  capability 
of  optics.  Depending  on  the  design  of  logic  circuits  in 
each  cell,  important  processing  operations  such  as 
comparison  and  matching  between  new  and  stored 
data  can  b«  performed  in  parallel.  Furthermore,  if  the 
IV-port  processing  SLM  can  serve  as  the  input  of  an 
optical  matrix-tensor  multiplier,  we  can  perform  par- 
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Fig.  9.  Pattern  substitution. 


allel  search  for  artificial  intelligence.37  38  By  allowing 
parallel  readout  from  mass  optical  storage  to  address 
in  parallel  the  nXn  ports  of  an  TV-port  processing  SLM 
(with  or  without  an  optical  matrix-tensor  multiplier 
attached),  many  well-defined  processing  functions  can 
be  performed  at  high  speed.  Then,  the  electronic  data 
converted  from  such  processed  optical  data  will  have 
much  lower  data  rates  on  the  average,  and  the  data  will 
be  much  richer  in  information. 

E.  Symbolic  Substitution 

In  addition  to  parallel  optical  readout  and  parallel 
optical  data  comparison  it  is  also  desirable  to  include 
more  complex  optical  processing  operations,  such  as  a 
search  with  wildcards  or  a  conditioned  search  before 
the  data  are  transferred  to  the  electronic  system.  C er  - 
tain  requirements  have  to  be  met,  however,  by  an 
optical  preprocessor  for  it  to  be  applicable  to  database 
systems. 

1.  Requirements  for  an  Optical  Preprocessor 

In  response  to  a  user  query,  a  large  number  of  stored 
pages  are  often  called  up  from  secondary  storage  even 
though  generally  a  large  percentage  of  these  pages  are 
not  of  interest.  To  reduce  the  information  presented 
to  the  electronic  system,  it  is  necessary  to  provide 
parallel  digital  optical  processing  involving  memory 
functions  and  programmability  and  to  match  the  pro¬ 
cessing  rate  at  which  data  are  read  from  the  storage 
medium.  A  pipeline  architecture  is  advantageous  be¬ 
cause  the  processing  rate  is  more  important  than  the 
pipeline  delay.  The  length  of  the  pipeline  can  serve  tc 
adapt  the  complexity  of  operation  to  the  requirements. 

In  addition  to  this  time  parallelism  provided  by  the 
pipeline  architecture,  spatial  parallelism  matching  the 
page  size  on  the  storage  medium  is  desirable  for  pro¬ 
cessing.  A  typical  format  consists  of  2-Kbyte  data 
pages  corresponding  to  16  Kbit3  or  a  128  X  128  size 
pixel  array.  Operation  on  these  arrays  must  occur  at 
the  readout  rate  and  be  rich  enough  to  perform  useful 
work.  Finally,  the  optical  processor  should  not  be 
fixed  but  programmable  to  adapt  it  to  various  de¬ 
mands. 
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2.  Principle  of  Symbolic  Substitution 

A  well-developed  technique  for  optical  digital  pro¬ 
cessing  is  symbolic  substitution.39  40  This  logic  is  able 
to  emulate  Boolean  logic,  cellular  logic,  arithmetic,  and 
Turing  machines.  Recently  a  functionally  program¬ 
mable  module  was  proposed.41  The  elementary  oper¬ 
ation  of  symbolic  substitution  is  pattern  substitution 
as  indicated  in  Fig.  9.  Each  occurrence  of  the  search 
pattern  in  the  input  plane  is  marked  in  the  intermedi¬ 
ate  plane  by  a  bright  dot.  In  the  substitution  phase 
each  of  these  dots  is  replaced  by  the  substitution  pat¬ 
tern.  These  primitive  operations  are  easy  to  imple¬ 
ment  optically  and,  with  future  developments  in  opti¬ 
cal  devices,  can  be  executed  extremely  fast,  because 
the  technique  can  be  applied  in  parallel. 

Symbolic  substitution  operates  on  binary  matrices. 
Logic  can  be  performed  by  transforming  all  the  occur¬ 
rences  of  a  given  spatial  configuration  of  binary  ele¬ 
ments  into  a  different  spatial  configuration  as  shown 
in  Fig.  10.  Several  different  transformations  can  also 
be  implemented  optically  in  parallel.  One  optical  pat¬ 
tern  transformation  block  consists  of  a  recognition 
part,  an  optical  inverter  array,  and  a  substitution  part. 
Both  the  recognizer  and  the  substituter  parts  are  pas¬ 
sive  optical  components  and  are  matched  to  their  cor¬ 
responding  search  pattern  and  replacement  pattern, 
respectively.  The  inverter  array  is  the  active  compo¬ 
nent,  responsible  for  thresholding  and  optical  power 
regeneration. 

Processing  can  be  achieved  by  applying  several  dif¬ 
ferent  pattern  transformations,  also  called  substitu¬ 
tion  rules,  simultaneously.  The  parallelism  of  optics 
thus  is  used  at  a  low  level  to  increase  speed  rather  than 
for  high  level  parallel  processing.  Although  these  pat¬ 
tern  transformations  are  global  or  space  invariant  op¬ 
erations  (the  same  rules  apply  to  all  locations  on  the 
array)  it  has  been  shown  that  this  mechanism  is  also 
able  to  support  local  operations. 

The  time  for  an  N-rule  pattern  transformation  is 
independent  of  the  number  of  rules  and  is  given  by  the 
propagation  time  of  light  through  the  setup  and  by  the 
response  time  of  the  inverter  array.  For  very  fast 
processing,  the  propagation  time,  which  could  be  of  the 
order  of  1  ns,  could  be  comparable  with  the  switching 
time  of  the  inverter  array.  The  progagation  time  cor¬ 
responds  to  the  latency  of  a  pipeline  processor  whereas 
the  throughput  depends  on  the  switching  time  of  the 
inverter  array.  At  high  data  rates,  it  is  necessary  to 
avoid  clock  skew.  Symbolic  substitution  supports  in¬ 
terconnects  with  a  latency  that  is  constant  down  to 
femtoseconds. 

Symbolic  substitution  also  supports  constant  fan  in 
and  constant  fan  out  gates,  because  the  substitution 
^ies,  specifying  the  search  and  the  substitution  pat¬ 
tern,  are  fixed.  This  feature  is  important  because 
*arge  fan  out  implies  high  power  consumption  and  high 
clock  rates  can  be  achieved  only  if  the  gates  are  opti¬ 
mized  with  respect  to  a  small  and  constant  number  of 
lnPuts  and  outputs. 


Logic  Function:  Substitution  Rules: 
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Fig.  10.  EXCLUSIVE  or  with  symbolic  substitution. 
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Fig.  11.  Functionally  programmable  module. 


Architectually,  one  substitution  rule  is  implemented 
by  one  optical  module.  Several  modules  implement¬ 
ing  different  substitution  rules  can  be  arranged  either 
in  parallel  or  in  sequence,  thus  forming  an  achitecture 
for  a  processor.  The  functionally  programmable  mod¬ 
ule42  consists  of  a  series  of  transformation  blocks  to 
perform  controlled  shift  operations  and  to  perform 
logic  as  shown  in  Fig.  11.  Every  bit  in  the  array  can  be 
programmed  to  mi  e  in  four  possible  directions.  The 
logic  set  includes  EXCLUSIVE  OR,  and,  and  the  identi¬ 
ty  operator.  The  program  for  this  module  is  interlaced 
with  the  data  and  enters  as  a  stream  of  optical  bit 
arrays. 

In  an  optical  parallel  pipeline  processor,  two  types  of 
parallelism  exist.  The  first  type  concerns  the  parallel 
processing  of  many  data  within  a  2-D  processing  array. 
This  type  may  be  called  spatial  parallelism.  In  a  pipe¬ 
line  there  is  also  a  second  type  of  parallelism.  In  each 
stage  of  the  pipeline,  an  array  of  data  is  processed 
simultaneously,  typically  by  different  operations. 
The  degree  of  parallelism  in  a  pipeline  is  given  by  the 
number  of  stages.  This  type  of  parallelism  may  be 
called  time  parallelism.  For  a  database  processor  both 
types  of  parallelism  are  applicable.  If  the  processing 
array  is  the  same  size  as  the  readout  array,  the  process¬ 
ing  stages  have  to  be  cascaded.  Between  those  ex¬ 
tremes  any  trade-off  between  lateral  and  longitudinal 
complexity  is  feasible. 
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V.  Conclusion* 

Very  large  databases  and  knowledgebases  (VLD/ 
KB)  are  at  the  heart  of  existing  information  systems 
and  will  play  an  even  more  prominent  role  in  the  fu¬ 
ture.  These  systems  place  extreme  requirements  on 
existing  electronic  digital  computer  technology;  re¬ 
quirements  that  are  often  not  met  when  VLD/KB  are 
present  or  real  time  responses  are  required. 

Optics  with  its  high  speed  and  bandwidth  has  much 
to  offer  for  the  solution  of  very  large  database  and 
knowledgebase  problems.  In  terms  of  storage,  optical 
disks  can  hold  in  order  of  magnitude  more  data  than 
magnetic  disks  per  unit  area.  Although  optical  disks 
are  currently  slower  than  magnetic  disks,  the  potential 
eiists  for  at  least  2  orders  of  magnitude  greater  data 
rates  with  multibeam  reads.  This  potential,  if  real¬ 
ized,  would  completely  change  the  way  database  and 
knowledgebase  problems  (as  well  as  others)  are  solved. 
In  addition  to  optical  disks,  page-oriented  holographic 
memories  hold  considerable  potential  for  performance 
improvement  of  the  solution  to  these  problems.  In 
addition  to  storing  massive  components  of  data,  they 
offer  ways  to  perform  processing  functions  during  data 
retrieval. 

The  transport  of  digital  data  via  optical  fiber  is  well 
developed  and  its  advantages  over  electronic  transport 
are  well  recognized.  It  now  appears  feasible  to  remove 
data  from  storage  and  send  it  through  fibers  to  optical 
processors  without  having  to  convert  from  photons  to 
electrons.  This  would  have  significant  performance 
advantages,  especially  if  data  can  be  read  from  storage 
at  hundreds  of  megabytes  per  second.  Such  rates 
would  flood  current  electronic  systems  since  they  are 
designed  for  magnetic  disk  rates,  which  are  ~3 
Mbytes/s.  This  leads  to  new  electronic  systems  as  well 
as  new  optical  systems.  Since  the  data  are  already  in 
optical  form,  there  are  considerable  advantages  to  pro¬ 
cessing  it  optically  before  conversion  to  electronic 
form.  More  research  and  development  of  digital  opti¬ 
cal  processors  that  perform  data  and  knowledge  base 
functions  in  parallel  are  needed. 

In  this  paper  we  have  considered  many  of  the  ways 
that  optics  can  play  a  role  in  the  increase  in  perform¬ 
ance  of  database  and  knowledgebase  systems.  We 
believe  that  there  is  considerable  potential  for  im¬ 
provement  and  hope  that  this  paper  helps  to  encourage 
active  research  in  this  important  area. 
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Abstract 


In  the  twenty-two  years  since  VanderLugf's  introduction  of  holographic  matched  filtering,  the  intensive 
research  carried  out  throughout  the  world  has  led  to  no  applications  in  complex  environment  This  leads  one 
to  the  suspicion  that  the  VanderLugt  filter  technique  is  insufficiently  complex  to  handle  truly  complex 
problems.  Therefore,  it  is  of  great  interest  to  increase  the  complexity  of  the  VanderLugt  filtering  opera¬ 
tion.  We  introduce  here  an  approach  to  the  real  time  filter  assembly:  use  of  page  oriented  holographic 
memories  and  optically  addressed  SLMs  to  achieve  intelligent  and  fast  reprogramming  of  the  filters  using  a 
104  to  10*  stored  pattern  base. 


Introduction 

Whether  the  twenty-two  years  of  research  on  VanderLugt ' s  Tittering  has  been  successful  or  not  depends  on 
how  one  defines  success.  From  a  researcher's  point  of  view,  it  has  been  very  successful  Literally  hun 
dreds  of  Ph.D.  theses  have  been  written.  Many  papers  have  been  written.  Simply  reviewing  the  review  arti 
cles  would  be  a  significant  task.  Therefore,  it  has  been  a  successful  topic  in  generating  research  work 
If,  however,  success  means  application  of  this  technology  in  the  field  from  which  most  of  the  money  has 
come:  (military  applications),  then  the  field  has  been  far  from  a  success.  This  paper  has  two  goals 
First,  it  seeks  to  offer  an  explanation  of  the  apparent  failure.  Second,  it  offers  a  new  approach  which 
attacks  the  problem  identified. 


Complexi ty 

In  numerical  calculations,  the  word  complexity  has  a  well-defined  meaning.  If  we  regard  the  operat  n  nf 
pattern  recogniton  as  a  well  defined  numerical  operation,  we  could  define  the  complexity  of  that  operation 
On  the  other  hand,  there  are  a  variety  of  pattern  recognition  schemes  ranging  from  correlation  with  a  single 
prototype  to  correlation  •  -t'  a  large  bank  of  prototypes  to  far  more  complex  operations  perhaps  involving 
motion  of  the  mask  and, or  oject.  The  scene  itself  has  complexity.  One  measure  of  this  is  its  information 
content.  This,  howe- or  .s  somewhat  misleading.  If  we  mean  by  complexity  the  difficulty  of  the  problem, 
the  difficulty  arises  ..ot  just  from  the  amount  of  information  that  can  be  packed  into  a  scene  not  from  the 
within-class  and  between-class  variations  of  realistic  objects.  If  we  model  human  pattern  ognition  as  a 
syntactic  process  with  a  vast  store  of  rather  flexible  prototypes,  pattern  recognition  is  probably  an  NP 
problem . 

The  pr4nt  of  all  this  is  that  realistic  problems  involve  tremendous  variations  among  a  vast  number  of 
possible  prototypes.  The  idea  that  one  or  even  a  bank  of  a  thousand  filtei  could  be  adequate  to  such  a 
task  seems,  on  the  surface,  highly  improbable.  There  is  simply  not  enough  stored  information  to  do  the  task 
properly.  I  believe  that  this  is  one  of  the  fundamental  reasons  VanderLugt  filtering  has  failed  to  give 
adequate  results  for  truly  realistic  complex  situations.  if  this  analysis  is  correct,  there  is  only  one 
possible  solution:  vastly  increase  the  information  available  to  do  the  filtering. 

Exhaustive  Versus  Nonexhaust ive  Search 


If  we  are  to  store  and  search  a  truly  vast  amount  of  information,  we  must  reexamine  the  previous  inclina¬ 
tion  toward  exhaustive  search  of  the  memory.  Clearly,  human  beings  do  not  employ  exhaustive  search  in  their 
pattern  recognition.  In  reading  these  words  you  are  searching  known  patterns  of  English  letters  and  words 
using  the  context  of  knowing  that  this  is  a  paper  on  optical  pattern  recogniton  being  written  in  English 
and.  for  some  of  you.  even  knowing  something  of  the  style  of  the  author.  Therefore,  you  do  not  have  to  be 
searching  that  part  of  your  memory  which  deals  with  the  names  of  your  pet  dogs  or  of  words  in  foreign  lan 
guages  or  of  the  map  of  your  city.  This  represents  a  compromise  between  speed  and  thoroughness  That  com¬ 
promise  can  be  accomplished  in  many  ways.  Nevertheless,  the  important  thing  for  these  purposes  is  to 
recognize  that  the  compromise  was  necessary  and  wise. 

The  Applications  of  Page  Oriented  Holograhic  Memories 
As  is  known  to  a  great  many  of  the  readers  of  this  paper,  page  oriented  holographic  memories  allow  stor- 
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j  tsel  f  can  be  changed.  Currently,  this  is  in  the  region  of  1  ailisecond  per  frane.  but  one  microsecond 
frame  time  spatial  light  modulators  are  currently  being  constructed  at  various  locations.  Thus,  at  the 
extreme,  we  could  do  an  exhaustive  search  of  a  million  spatial  light  modulator  patterns  in  the  time  of  one 
second.  Because  one  second  is  usually  too  slow  and  also  because  we  do  not  wish  to  digest  that  much  informa¬ 
tion  each  second,  it  seems  prudent  to  consider  intelligent  nonexhaustive  searching. 
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Utilization^ f  Stored  Informati o_n  for  Pa t tern _ Recognl U on 

Because  the  spatial  light  modulator  is  limited  to  the  range  of  10*  to  10*  pixels,  we  must  be  clever  in 
the  way  we  use  the  stored  information.  By  comparison  with  a  film  hologram,  for  instance,  the  spatial  light 
modulator  may  contain  only  a  fraction  of  a  percent  of  the  information  content.  Two  approaches  seem  reaso¬ 
nable.  One  is  to  design  pattern  recognition  kiniforms  using  yet-to-be-dileanated  rules  for  working  in  this 
pixel  impoverished  environment.  The  other  approach  is  to  store  not  the  filter  but  the  object  whose  complex 
conjugated  Fourier  transform  is  the  filter  We  use  that  "image''  to  be  one  input  in  a  joint  transform  corre¬ 
lator.  The  balance  of  this  paper  is  written  in  such  a  way  as  to  be  independent  of  that  particular  choice 

Mask  Management 

Perhaps  the  most  important  feature  of  the  massive  memory  optical  pattern  recognizer  is  the  intelligent 
use  of  the  stored  data.  That  Is,  the  plan  for  appropriate  nonexhaustive  search.  Because  random  access  to 

any  of  the  stored  patterns  can  be  achieved  so  rapidly,  the  physical  arrangement  of  holograms  on  the  common 

substrate  is  essentially  of  no  importance.  On  the  other  hand,  the  data  oase  management  in  the  electronic 
domain  that  determines  which  hologram  is  addressed  at  any  Instance  needs  to  be  designed  very  carefully. 

Thus,  as  is  often  the  case,  it  is  electronics  and  not  optics  that  represents  the  ultimate  limitation.  We 
now  explore  some  of  the  possible  data  base  management  concepts  which  are  appropriate  for  optical  pattern 
recognition. 

It  may  be  that  what  we  want  to  do  is  arrange  the  memory  in  terms  of  object  parameter  variations.  Those 

variations  may  be  due  to  range  and/or  orientation  of  the  object  relative  to  the  observation  system.  Also 

stored  might  be  wavelength  of  those  patterns.  In  any  case,  there  is  a  multidimensional  parameter  space 

which  must  be  searched.  This  would  appear  to  require  a  multidimensional  tesselatlon  of  that  3pace  after 

appropriate  scale  and  distortion  of  that  space  to  reflect  importance  and  in  realistic  variability  What  is 

then  needed  is  a  lookup  table  which  transforms  parameter  sets  to  x-y  deflection  to  call  forth  the  proper 

mask  information  from  the  hologram.  Thus  we  must  design  a  sensible  map  from  the  many  dimensional  space  to 
whatever  arbitrary  two  dimensional  pattern  we  have  used  to  store  the  data  on  the  hologram.  It  is  not  my 
purpose  to  discuss  the  design  of  multidimensional  lookup  tables  in  this  paper. 

One  obvious  use  of  the  hologram  or  holograms  is  to  do  exhaustive  search  by  category.  The  hologram  or 
holograms  can  be  organized  in  such  a  way  that  they  are  restricted  in  category  or  context.  For  instance  in  a 
military  environment,  one  might  wish  to  apply  entirely  different  sets  of  masks  for  target  acquisition,  tar¬ 
get  tracking  and  terminal  holding.  These  can  represent  separate  regions  on  the  hologram  are  even  separate 
holograms.  Again,  it  is  not  so  much  the  organization  of  the  hologram  as  the  organization  of  the  electronic 
addresser  that  is  of  importance.  If  the  number  of  contexts  or  per  categories  is  sufficiently  large  and  suf¬ 
ficient  fuzzy,  it  may  be  that  It  is  sufficient  to  specify  a  context  and  do  exhaustive  searching  with  that 
context.  This  represents  a  two  level  organization.  The  first  level  is  to  determine  the  appropriate  con¬ 
text  At  the  second  level,  we  simply  do  exhaustive  searching.  From  here,  it  is  not  hard  to  generalize  to  a 
multi  level  search  Broad  context  are  sought  and  then  narrow  context  sought  within  those.  This  establishes 
a  tree  structure 

It  Is  possible  to  consider  composite  masks  or  as  variously  termed  "linear  combinations  of  matched 
filters"  (1)  or  "composite  matched  filters"  (2).  We  accomplish  positive  weightings  by  sequencing  through 
all  of  the  positively  weighted  components  and  varying,  for  example,  the  intensity  of  the  laser  beam.  The 
time  integrated  correlation  plane  pattern  is  then  stored.  Next,  we  generate  the  sum  of  negatively  weighted 
components  In  a  second  memory.  Finally,  we  sub  stract  the  two  integrated  images  to  obtain  the  desired 
results.  We  then  have  the  potentiality  of  generating  quite  general  filters  simply  by  controlling  the 
waitings.  Implicit  in  this  Is  the  assumption  that  we  can  run  through  a  wide  variety  of  masks  very  rapidly. 
Even  with  the  slowest  of  current  spatial  light  modulators.  It  takes  only  two  frames  to  do  a  general  compos¬ 
ite  matched  filtering.  That  Is  because  we  are  averaging  or  integrating  during  the  entire  cycle.  The 
weights  can  be  predetermined  or  even  adaptively  determined. 

Conclusion 

The  material  Just  presented  is  an  outline  of  one  approach  to  the  vast  increase  in  complexity  that  is  pro¬ 
bably  needed  to  make  optical  pattern  recognition  practical  for  many  purposes.  Because  the  page  oriented 
holographic  offers  storage  and  access  capabilities  far  beyond  those  which  can  be  offered  electronically,  the 
value  of  optics  is  enhanced.  That  is,  this  Is  a  clear  illustration  of  a  case  in  which  optics  can  Bake  prac¬ 
tical  what  would  be  essentially  impossible  electronically.  As  with  any  new  solution,  this  one  carries  with 
it  a  great  many  new  problems.  I  hope  that  the  outline  of  these  problems  is  greeted  as  an  opportunity  for 
invention  and  not  as  an  excuse  for  inaction.  Any  time  we  have  an  opportuity  to  do  something  important  that 
cannot  be  done  electronically,  we  should  explore  that  opportunity  carefully. 
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PAGE  ORIENTED  HOLOGRAPHIC  MEMORY  ADDRESSING  OF  OPTICAL  BISTABLE  DEVICES  ARRAYS 
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Abstract 

Page  Oriented  Holographic  Memories  can  be  used  as  stored  microprograms.  Because  they  nay  not  give 
highly  accurate  signal  levels,  e.g.  to  O.lhi  as  aay  be  needed,  these  holograas  should  address  not  a  device 
array  but  an  array  of  optical  bistable  shutters  through  which  adjustable  stable  light  beans  nay  pass 

Page  Oriented  Holographic  Menories 


Their  Glorious  Past 


There  was  a  tine  in  the  aid  1960's  and  1970's  when  page  oriented  holographic  neaorles  (hereafter.  POHM'3) 
held  center  stage  in  the  world's  holography  effort.  The  United  States  efforts  Included  nassive  efforts  by 
IBM  (Lohaann,  Bryngdahl,  etc.)  and  AT&T  (Collier,  Borckhardt.  Lin,  Anderson,  etc.)  The  nost  nassive 
European  effort  was  by  Philips.  The  hope  was  to  build  a  read-write  neaory  of  gigantic  storage  capacity 
(say,  101#  bits)  with  exceptional  randoa  access  tiae  (say,  10-*  seconds).  This  10”  or  better  bit  per  sec¬ 
ond  randoa  access  aeaory  was  to  fee  the  hoped  for  supercomputer  of  the  1980s. 

The  basic  concept  (1)  is  extreaely  siaple.  The  first  Figure  illustrates  it  scheaatically .  Many  snail 
holograas  were  to  be  foraed  on  nonoverlapping  areas  of  the  saae  substrate.  A  Inn  dlaaeter  subhologran  night 
produce  an  iaage  of  a  1024  x  1024  array  of  on  -  off  points.  The  real  inage  froa  each  subhologran  is  foraed 
at  the  saae  place  in  space.  At  that  place  an  array  of  parallel  read  out  detectors  was  to  be  placed.  By 
deflecting  a  laser  beaa  so  that  the  proper  holograa  was  illuainated.  we  could  cause  any  one  of  the  stored 
point  patterns  to  hit  the  parallel  detector  array  essentially  Instantly.  Deflectors  of  nanosecond  randon 
access  capabilities  were  built.  Careers  were  built.  Soon,  however,  the  projects  were  abandoned. 

What  went  wrong?  In  a  word:  everything.  The  search  for  a  suitable  read-write  naterlal  failed.  The 
ultiaate  use  failed  to  aateriallze  as  did  arrays  of  1024  x  1024  parallel  readout  1  nanosecond  detectors.  It 
was  never  clear  how  supercoaputers  could  use  that  auch  data  at  that  rate. 

Their  Inglorious  Present 

Across  the  world.  POHMs  are  dead.  It  is  not  even  a  serious  research  field.  The  sole  exception  to  this 
sad  tale  is  the  Soviet  Union  which  has  what  appears  to  be  a  significant  effort  in  this  field  Someone,  we 
or  they.  Is  wrong. 

Why  are  the  Soviets  doing  this?  If  we  only  read  their  English  language  publications  (2).  the  answer 

becomes  clear. 

Their  Glorious  Future 


The  future  of  POHMs  is  optical  computing.  We  use  the  P0HM  to  reprogram  an  optically  addressed  spatial 
light  modulator  (SLM).  SLMs  are.  at  last,  becoming  fast.  Parallel  read  in  and  parallel  read  out  are  tri¬ 
vial.  Furthermore,  for  many  purposes,  a  read  only  P0HM  suffices.  Every  objection  to  the  P0HM  from  the 
1960s  has  vanished  in  the  1980s. 

Uses  With  Optical  Bistable  Device  10BD)  Arrays 


Enabling 

We  aay  wish  to  use  the  080  array  for  blocking  unwanted  interconnections .  For  instance  an  optical  cross¬ 
bar  could  be  foraed  using  an  N  x  N  0BD  array  to  perform  1  to  1  connection  of  N  sources  to  N'  receivers  (3). 
For  each  of  the  N*  possible  interconnections  there  is  precisely  one  on-off  pattern  for  the  ODD  array  that 
achieves  it.  The  signal  beams  themselves  can  be  relatively  weak  so  that  even  N  of  them  will  not  switch  the 
080.  A  POHN  could  then  be  used  to  switch  (enable)  the  appropriate  OBDs .  Wavelength,  polarization,  angle, 
or  some  combination  of  these  can  separate  the  signal  beam  from  the  stronger  enabling  beam.  Parallel 
addressing  makes  the  reprogramming  fast  (limited  by  either  the  OBD  or  the  light  deflector) 
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Optical  Fredkln  Gate  Array* 

In  a  prior  publication  (4)  we  showed  that  optical  Fredkln  gate  arrays  allow 

•  logic  (all  functions), 

•  aeaory,  and 

•  interconnections. 

A  pending  publication  (5)  shows  that  N(N-l)/2  Fredkln  gate  array  can  make  an  N  x  N  optical  crossbar  switch 
If  the  array  can  be  addressed  optically  it  can  be  switched  by  a  POKM  All  of  this  is  of  interest  to  an  OBD 
conference  only  if  OBD  Fredkin  gates  are  possible.  We  now  show  that  they  are. 


An  ordinary  logical  gate  is  really  a  nonlinear  function  generator.  Two  input  binary  variables  (A  and  B) 
generate  a  binary  output  C.  Since  there  are  only  four  possible  A.  B  patterns,  C  is  often  represented  by  a 
truth  table  The  truth  table  for  the  AND  function  is  shown  below. 


A  B 
0  0 
0  1 
1  0 
1  1 


C  »  A  AND  B 
0 
1 
1 
1 


There  are  fewer  outputs  than  inputs,  so  information  is  lost.  For  example,  if  A  AND  B  is  1 .  we  can  no  longer 
say  what  values  A  and  B  had 


A  Fredkin  gate  conserves  information.  The  next  Figure  shows  a  Fredkin  gate  schematically  There  are 
three  inputs  (A,  B,  and  C)  and  three  outputs  (A',  B ’ .  and  C' )  Given  one  set  of  three,  we  can  infer  the 
other  set  using 


C "  —  c 


IF  IF 

C=0  C=1 


A '--A 
B 1  — *B 


A '  B 
B’— A 


Reference  5  and  other  references  therein  show  that  such  gates  can  perform  all  logical  functions,  many  memory 
operations,  and  quite  generalized  switching. 


We  turn  now  to  OBD  Fredkin  gates  One  way  to  assemble  one  of  these  is  shown  below. 


The  intensities  of  A  and  B  are  below  threshold  and  so  is  their  sum.  so  with  C»0.  the  OBD  reflects  achieving 
A’-A  and  B’«8.  With  sufficient  applied  signal  (C-l),  the  OBD  transmits  giving  A’«B  and  B’»A. 

A  second  version  is  shown  in  Figure  below.  The  A  states  is  vertically  polarized  The  B  state  is  verti¬ 
cally  polarized.  An  ordinary 


beamsplitter  (OBS!  directs  light  to  a  polarizing  beamsplitter  (PBS)  which  directs  the  vertically  polarized 
light  down  to  fora  A'  and  transmits  the  horizontally  polarized  light  to  form  B' .  Thus  in  the  reflective 
mode  of  the  OBM  (C»0),  A'*A  and  B'*B  When  the  OBD  threshold  is  exceeded  (C*l),  the  vertically  polarized 
light  A,  is  reflected  into  the  B  channel  and  vice  versa 
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A  third  version,  preferable  to  the  second  for  cascading,  marks  A  and  A'  with  vertical  polarization  and  B 
and  B'  with  vertical  polarization.  This  is  easy  to  do  as  shown  in  Figure  below. 


OBD 


Also  shown  in  Figure  above  is  a  lossless  recombiner 

Let  us  represent  a  generic  0B0  Fredkin  gate  as  in  Figure  below. 


We  can  then  combine  these  in  various  ways  (6).  For  example,  to  connect  a  linear  array  of  2'^  sources  to  a 
linear  array  of  2‘  detectors  we  need  2N  layers  of  OBD  Fredkin  gates  as  indicated  in  Figure  below  for  N«2. 


If  we  use  a  DOHM  to  switch  the  N(N-l)/2  OBD  Fredkin  gates,  the  switching  time  is  limited  by  the  slower  of 
two  times  the  detector  response  time  or  the  laser  deflection  time. 

CONCLUSIONS 

Optical  bistable  devices  (OBDs)  can  be  viewed  as  optically  controllable  operators.  Arrays  of  optical 
bistable  devices  can  be  "programmed"  by  patterned  light  from  any  of  10*  to  10*  holograms  any  of  which  can  b- 
accessed  in  a  laser  deflection  time  (from  milliseconds  mechanically  to  microseconds  acoustooptical ly  to 
to  nanoseconds  electroopt teal ly )  If  the  OROs  can  respond  in  nanoseconds,  this  represents  the  switching 

speed  Mtnoiigh  10*  to  10*  is  a  large  number  of  'programs,''  it  is  rertainlv  finite.  In  this  sense  we  have 
a  Reduced  Instruction  'Set  Compute'' 
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ABSTRACT 

Holography  can  be  used  for  arbitrary,  parallel,  weighted  aapplngs  between  planes  with  up  to  to  10* 
pixels  each  at  I/O  Halted  rates.  This  allows  precalculated  aapplngs  to  occur  at  very  high  speed.  The 
applications  for  one-to-one.  one-to-nany.  aany-to-one,  and  aany-to-aany  sape  are  explored  here. 

1 .  INTRODUCTION 

Recently  (1.2),  I  have  shown  that  It  la  possible  to  sap  large  .nput  scenes  (up  to  to  1000  x  1000)  into 
large  output  scenes  (up  to  1000  x  1000)  using  arrays  of  holograms.  Figure  1  shows  a  schematic  drawing  of  a 
passive  (Spatial  Light  Modulator  or  SLM)  Input  system.  Figure  2  is  a  schematic  drawing  of  an  active  (source 
array)  Input.  We  will  devote  less  attention  to  the  hardware  than  to  the  applications  In  what  follows. 


fit.  1.  An  NX*  holograa  artsy  Is  laagad  oaco  sa  NX*  output  array  through 
sa  SLM.  Each  holograa  lllualnatas  tho  SLM  with  s  uni qua  pattarn 
of  light.  All  work  in  parallel. 
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SYSTEM  SCHEMPTIC 


6  *  71  INPUTS 


Fig.  2.  As  Actlva  Maaalva  Iatarcmnacc  Syscca  Ualn* 
An  Array  Of  Modulatad  Sourcaa  An  Tha  Input: . 


Pig.  3.  Otm  Way  of  R«pr«**ntlnf  A  Ponr- 
Olaanslonal  Spaca  in  Two 
Olaanalona. 
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The  three  typ«s  of  aapplng  of  lntareat  below  are  one-to-one,  one-to-nany.  and  aany-to-aany . 

2.  ONE-TO-ONE  MAPPIMO 

This  applications  la  soaetlaas  called  "coordinate  transformation."  Soae  familiar  transforaatlons  are 


*.y 

-  r,9 

Polar 

r.e 

-  x.y 

x.y 

-  log(x) . log(y) 

x.y 

-  exp(x) ,exp(y) 

etc. 

The  patterns  are  calculated  and  eabodled  In  holograms.  The  transforaatlons  are  then  speed  Halted  by 
Input/Output  or  I/O. 

It  la  not  necessary  that  the  aapplng  have  a  geometric  Interpretation.  For  example.  x  and  y  could  be 
the  first  two  principal  components  in  an  N  dimensional  feature  space  (the  two  orthogonal  vectors  In  that 
space  along  which  best  separation  aaong  events  occurs).  Thus  the  x-y  location  of  an  event  gives  a  closest- 
category  Interpretation  an  a  probability  aaaaure.  Ns  can  use  as  an  output  one  dimension  along  which  catego¬ 
ries  are  arrayed  and  a  second  dimension  which  gives  the  probability  scale  for  that  event. 

Likewise  mappings  need  not  be  confined  to  two  dimensions.  Me  can  represent  N  dimensional  spaces  by 
positions  along  a  one-dimensional  apace  filling  curve.  Or  wa  can  sample  the  space  In  soae  sore  pictorial 
fashion  such  as  shown  In  Figure  3. 

In  principle,  ww  do  not  absolutely  require  uniform  spacing  in  either  the  Input  plane  or  the  output 
plane,  so,  for  exaaple,  unlforaly  spaced  x.  y  points  can  be  transformed  Into  nonunlforaly  spaced  y,8  points. 
Thla  allows  an  ’exact*  (no  lnterpolation/extrapolatlon)  napping.  This  can  present  an  accuracy  problem  If  it 
requires  the  holograms  to  overlap. 

3.  ONE  TO-MANY  AMD  HAKV-TO-OHB  MAPPINGS 

A  good  exaaple  of  the  use  of  holographic  aany-to-one  aapplng  is  in  an  optical  Deapater-Shafer  (D-S) 
evidential  reasoning  nachlne  (3).  It  Is  easy  to  show  that  to  update  our  beliefs  on  the  basis  of  new  evi¬ 
dence  using  vector  outer  products  to  ’correlate’  evidence  and  holograms  to  route 

-  outer  product  terms  consistent  with  proposition  Pj  to  a  detector  to  give  the  unnormalized  sup¬ 
port  outar  product  terms  inconsistent  with  Pj  to  a  detector  to  give  the  unnoraallzed  doubt  Dj.  and 

-  autually  Inconsistent  outer  product  terns  Into  a  single  detector  to  give  a  tera  I. 


Me  then  calculate  our  new  beliefs 
bl  '  (sl-  pi> 

about  proposition  Pj.  where 

Sj  *  support  for  Pj  •  Sj/d-I) 


and 


Pj  •  plausibility  of  Pj  •  [l-(Dj/U-I)) J. 

A  good  one-to-aany  application  la  the  Hough  transfora.  In  a  Hough  transfora  for  paraaetric  fit  of 
straight  lines  to  x.y  input  points,  we  sight  use  equations  of  the  fora 

y  •  ax»b. 

The  straight  lines  through  the  point  xQy0  satisfy 
b  *  -*oB*V 

That  Is  a  point  In  the  x.y  plane  aape  Into  a  straight  line  In  b.a  space.  Two  points  map  into  two  straight 
lines.  The  Intersection  of  those  lines  gives  the  b  and  a  of  the  line  through  both  points.  Many  x.y  points 
lead  to  aany  intersections  In  b.a.  On  the  other  hand,  points  "pile  up’  near  b.a  points  which  represent  aul- 
tlple  point  straight  lines  In  x.y.  Mlth  this  aethod  we  can  do  large  Hough  transforms  in  0(1)  tlae  (~  Billi¬ 
seconds  due  to  I/O). 
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4 .  MANY -TO-KANV  MAPPINGS 

There  1*  evidence  Chat  biological  reasoning  uses  napping  of  lnfornatlon  froa  one  "fraae  of  reference" 
to  another  (4).  Locations  representing  aultlple  data  coincidence  neighborhoods  can  nap  into  decision  plane 
neighborhoods  in  another  plane. 

Of  course,  the  ultlaate  coaputer  goal  is  intelligence.  The  physical  basis  for  an  intelligent  coaputer 
auat  be  highly  coaplex.  To  achieve  useful  intelligence,  we  will  need  high  speed  as  well.  Using  this  aethod 
ae  can  interconnect  each  of  a  1000  x  1000  input  array  to  each  of  a  1000  x  1000  output  array  fully  in 
parallel.  This  coablnatlon  of  coaplexlty  and  speed  (e.g.  10'*  interconnections  in  a  Billisecond  or  lo" 
connections  per  second)  could  serve  aa  the  physical  basis  for  true  Intelligence  if  auch  aore  attention  is 
devoted  to  how  to  transduce  cognitive  concepts  into  the  appropriate  fora  of  entry  Into  this  systea  (  ) . 

5.  CONCLUSION 

The  ability  of  optics  to  perfora  full  interconnection  froa  a  large  input  array  to  a  large  output  array 
In  parallel  creates  aany  new  possibilities,  fast  algorlthas.  e.g.  for  Hough  transforas,  are  not  needed,  ir 
ae  need  the  speed  and  can  afford  the  hardware,  they  can  be  done  in  the  tlae  required  for  I/O.  Advanced  coa- 
putatlonal  aethods  which  would  be  too  slow  with  partially  serial  electronics  becoae  feasible  with  parallel 
optics.  In  particular  aassive  neural  networks  fit  in  this  cstagory. 
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STACKED  PAGE  ORIENTED  HOLOGRAPHIC  MEMORY 
H.  J.  Caulfield 
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ABSTRACT 

While  page  oriented  holographic  aeaoriea  are  extrenely  valuable,  they  can  take  a  great  aaount  of  lateral 
space.  We  show  here  how  to  stack  a  nuaber  of  holograas  in  such  a  way  that  we  can  select  one  layer  to  be 
’'active.’  As  a  result,  the  lateral  area  needed  to  store  a  given  nuaber  of  holograas  is  reduced  by  L.  the 

1 INTRODUCTION 

Conceived  of  by  Saits  and  Gallaher  (1)  as  a  coaputar  aeaory,  the  Page  Oriented  Holographic  Heaory  (POHM) 
appears  to  have  found  aany  applications  In  optical  coaputlng  (2. 3. 4. 5. 6).  The  basic  POHM  geoaetry  Is  shown 
In  Pig.  2.  A  laser  beaa  Is  deflected  to  the  proper  subholograa.  Whatever  holograa  la  lllualnated  produces 
an  output  light  pattern  at  a  preselected  location.  In  optical  coaputlng,  we  noraally  use  an  optically 
addressed  spatial  light  aodulator  at  that  location. 


Fid<  2.  Light  enterins  prom  the  lept  can  sc 
DIRECTED  OUT  ANY  OP  THE  CELLS. 


The  prlaary  problea  with  POHMa  is  that  the  subholograas  need  to  be  one  to  two  allllneters  in  dlaaeter. 

If  we  want  to  have,  say,  a  1000  x  1000  array;  we  need  at  least  one  square  aeter  of  substrate. 

The  goal  of  this  work  Is  to  find  a  way  to  coapact  the  POHM  laterally  by  extending  It  longitudinally  Into 
L  layers.  If  we  can  than  select  the  layer  of  Interest,  we  can  address  by  x.  y.  and  k.  where  k  is  the  index 

2.  UY8R  SELECTION 

Polarisation  seens  to  be  the  aoat  logical  layer  selection  aethod.  A  longitudinally  Pockels  cell  can 
change  the  polarization  of  the  light  passing  through  It.  A  second  longitudinally  Pockels  cell  can  change 

the  polarization  back  to  its  original  state.  Thus,  what  we  want  Is  a  POHM  which  works  for  one  polarization 

state  but  not  for  the  orthogonal  state.  For  aany  years  1  have  sought  suitably  asyaaetrlc  holograas.  The 
only  holograa  with  a  truly  aasalve  asyaaetry  I  have  cone  up  with  (thanks  to  Steve  Case  and  Toaascz  Jannson) 

la  a  thick  holograa  In  which  the  rays  are  diffracted  by  90°  Inside  the  holograa.  For  a  variety  of  reasons, 

this  Is  not  a  good  solution  for  stacked  POHMa.  Thus  I  turned  to  what  I  call  "polarization  transducers’  - 
devices  which  convert  polarization  Into  other  properties. 

A  polarisation  transducer  Is  a  device  that  changes  polarization  (which  is  easy  to  control)  Into  soae 
other  property  which  night  be  nore  difficult  to  control.  In  particular,  having  had  soae  previous  experience 
In  using  polarisation  switches  and  blrefrlngent  prlsns  to  direct  light  to  suitable  holograas  (2).  I  thought 
to  apply  this  technology  to  stacked  holograas.  What  now  follows  is  s  step  by  step  description  of  the 
buildup  of  a  stacked  holograa  array. 
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Plrat.  we  CM  consider  a  single  element  as  shorn  in  Figure  1.  Depending  on  th*  polarization  state,  which 
depends  on  the  incidence  polarization  and  whether  the  switch  is  on  or  off.  light  is  either  transaitted 
through  the  prlsn  or  deflected  downward.  The  light  which  continues  to  propagate  can  enter  other  such  units. 
The  light  deflected  downward  strikes  a  hologran  on  the  botton  side  of  the  prise.  Thus,  that  hologran  is 
ellnlnated  or  not  ellalnated  depending  on  whether  the  polarization  of  light  passing  through  the  snitch  is 
proper . 

Second,  we  can  stack  a  large  auaber  of  these  longitudinally.  Figure  2  shows  the  scheae.  Clearly,  one 
addressing  bean  can  address  any  of  those  holograna. 

Third,  we  can  arrange  a  plane  filled  with  such  devices  as  shown  in  Figure  3.  Now.  wherever  we  put  a  bean 
along  a  line,  we  can  readout  froa  a  particular  layer.  That  la.  we  can  address  a  two  dieenslonal  array  of 
holograna  via  a  one  dlnenalonal  scan  plus  polarization  switching  of  layers. 


Tig.  1.  eight  1»  Incident  of  •  palarKacioa  twitch  which 
either  rotates  the  plena  of  polarisation  SO9  or 
leaves  it  unrotatad.  The  light  chan  enters  a 
polarising  prise.  Depending  on  tha  polarisation, 
chat  prlan craasalts  or  deflects  cha  bean  down¬ 
ward  by  90  .  a  hologram  on  tha  botton  of  tha 
pries  can.  cherafore,  ba  addressed.  Us  assume 
chat  tha  hologrM  is  highly  efficient,  crans- 
alttlng  at  moat  10' 3  of  cha  incident  light. 

Such  holograms  are  now  routine  in  dlchroeatad 
gelatin. 


Fig.  3.  By  expanding  the  prisms  and  cells  contin¬ 
uously  HE  CAN  HAKE  A  SHEET  OP  CELLS  OP  THE 

Fig.  1  TYPE.  The  divisions,  shown  dashes. 

CORRESPOND  TO  SEPARATE  L13HT  PATHS  SUT  NOT 
TO  PHYSICAL  DIVISIONS. 


Fourth,  we  cone  to  the  aost  difficult  part  of  stacking  these  layers  on  top  of  each  so  that  we  fill  a 
three  dimensional  space  full  of  accessible  holograna  (Fig.  4).  The  problen  with  this  scheae  is  that  each 

layer  aust  be  readout  through  all  layers  between  it  and  the  target  plane.  Those  layers  contain  a  variety  of 

swltchee.  prlaaa  and  holograna.  It  reaalns  to  be  seen  what  quality  of  iaage  can  be  foraed  through  these. 
Also  polarization  effects  could  be  disastrous.  What  is  certain,  is  that  the  exposures  oust  take  place 
through  all  intervening  aedia.  Holograna  so  recorded  can  coapensate  for  a  great  aany  alnor  refractive 
defecta.  One  the  other  hand,  they  cannot  coapensate  for  changes  in  polarization  or  for  light  bent  away  froa 

***•  region  of  the  hologran  Itself.  In  all  likelihood,  it  will  take  considerable  experlaentatlon  to  learn  to 

do  this  properly. 
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real  surface 

CONCEPTUAL 

SURFACE 


Fig.  4.  In  the  3D  version,  the  pockels  cells  are 

CONTINUOUS  IN  TWO  DIMENSIONS  WHILE  THE 
PRISMS  FALL  INTO  DISTINCT  LAYERS  BUT  ARE 
CONTINUOUS  WITHIN  THE  LAYERS. 


3.  DISCUSSION 

The  prlmry  effect  of  the  scheae  we  have  just  discussed  is  to  fill  a  three  diaenalonal  space  rather  than 
a  two  dlaenslonal  space  with  page  oriented  holograaa.  This  Is  clearly  a  better  use  of  space  than  the  tradi¬ 
tional  POHN.  On  the  other  hand,  auch  reaalns  to  be  explored  concerning  how  well  holograaa  can  perfora 
through  layers  of  other  holograaa.  Soae  sort  of  quasi  Fourier  transfora  hologram  seeae  indicated,  see  Fig. 
3.  To  select  a  holograa.  we  select  a  layer  by  switching  one  of  the  Pockels  cells  and  then  deflect  x-y 
position. 


Fig.  5.  Each  hologram  can  be  a  Fourier  transform  hologram, 

SO  THERE  IS  A  COMMON  OUTPUT  PUNE.  THAT  PUNE  IS 
OFF  AXIS  SO  EFFICIENT  HOLOGRAMS  CAN  BE  '  ADE.  An 
EXAMPLE  HOLOGRAM  SELECTION  IS  SHOWN. 
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APPENDIX  D 


MISCELLANEOUS  APPLICATIONS 


1 .  FREDKIN  GATES 

The  initial  papers  in  this  field  (Appl.  Opt.  25,  1604  and  SPIE  625. 
2)  stirred  much  research  and  is  widely  cited.  It  has  led  (not  sponsored  by 
ONR)  to  sub  kT  operation  of  optical  processors.  Our  ONR  sponsored 
work  was  on  various  applications  such  as  rapidly  programmable  switches 
(Appl.  Opt.  2§,  1032),  new  configurations  (Appl.  Opt  26,  3455),  and  a 
more  compact  residue  arithmetic  architecture  (Appl.  Opt.  26,  3940. 

2.  PATTERN  RECOGNITION 

Despite  the  wonderful  invariance  properties  of  prior  recognition 
masks,  they  were  very  difficult  to  manufacture.  We  showed  how  to 
simplify  mask  design  and  manufacture  tremendously  (Appl.  Opt.  26, 

231 1 ;  SPIE  613.  260;  and  Appl.  Opt.  27,  2895).  This  work  has  led  to 
much  other  work  (still  ongoing  under  other  sponsorship  at  several 
institutions). 

3.  NEURAL  NETWORKS 

While  neural  networks  are  the  obvious  application  of  massively 
parallel  optical  interconnections,  they  present  huge  accuracy  problems. 

We  showed  the  first  general  way  to  train  neural  networks  for  low  accuracy 
operation  (IEEE  Trans.  Systems,  Man,  and  Cybernetics,  accepted).  We 
then  mapped  out  a  general  approach  to  utilizing  the  new-found  complexity 
capability  (WNN-AIND  90  AND  IJCNN  90). 

4.  FUNDAMENTAL  BOUNDS 

One  of  the  most  cited  papers  from  this  whole  contract  is  the 
demonstration  that  parallel  optical  processors  have  a  fundamental  speed 
limit  of  about  0.01  GH3  (Appl.  Opt.  2£,  1567). 


Optical  computing  and  the  Fredkin  gates 


Joseph  Shamir,  H.  John  Caulfield,  William  Micefli,  and  Robert  J.  Seymour 


The  use  of  optics  to  implement  the  Boolean  logic  functions  traditionally  used  in  conventional  electronic 
computing  is  an  active  area  of  optical  computing  research.  Many  proposed  optical  implementations  dupli¬ 
cate  the  configuration  of  electronic  logic  gates  and  hence  may  not  optimally  utilize  the  full  benefits  of  optical 
techniques.  We  present  here  a  new  optical  gate,  the  Fredkin  gate,  which  may,  in  principle,  be  mimmallv 
dissipative  (i.e.,  exhibit  reversible  logic)  and  whose  response  time  may  be  limited  in  some  implementations 
only  by  the  duration  of  optical  pulses  ( i.e.,  in  the  picosecond  range).  Such  gates,  which  consist  of  three  input 
and  three  output  lines,  can  be  programmed  to  produce  a  standard  set  of  Boolean  functions  and  appear  well 
matched  to  the  parallelism  of  optics.  We  present  here  a  number  of  optical  implementations  of  Fredkin  gates 
and  suggest  ways  of  composing  their  interconnections  to  achieve  combinatorial  logic,  circulating  memories 
and  generalized  interconnects. 


I.  Introduction 

“The  energy  requirements  of  basic  logic  operations 
ultimately  impose  fundamental  limits  on  achievable 
computation  rates  and  all  largely  independent  of  de¬ 
vice  implementation  technology.”  1  Part  of  this  ener¬ 
gy  consumption  is  due  to  the  intrinsic  nature  of  the 
traditional  composition  of  logic  elements.  This  fact 
becomes  evident  if  we  recall  that  a  conventional  logic 
gate  has  more  input  lines  than  output  lines.  Thus 
some  of  the  information  coming  into  the  gate  is  lost 
and  cannot  be  retrieved.  The  irreversible  nature  of 
the  gate  makes  it  dissipative  not  only  in  information 
but  also  in  energy.  In  an  effort  to  overcome  these 
limitations,  Fredkin  and  Toffoli2  proposed  a  new  kind 
of  logic  gate  which  has  the  same  number  of  output  lines 
as  it  has  input  lines.  Fredkin  gates  are  capable  of 
performing  conventional  logic  operations  while  pre¬ 
serving  all  the  original  information.  In  contrast  to  the 
conventional  logic  gates  the  Fredkin  gates  may,  in 
principle,  be  run  backward  to  regenerate  the  original 
input  signals. 

The  purpose  of  this  work  is  to  introduce  the  optical 
Fredkin  gate,  illustrate  its  programmability,  and  sug¬ 
gest  it  as  a  basic  building  block  of  an  optical  computer. 
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An  overview  of  the  main  aspects  of  the  Fredkin  gate  is 
given  in  the  next  section  followed  by  a  variety  of  pro¬ 
posed  optical  implementations.  A  number  of  useful 
configurations  are  discussed  in  a  final  section. 

II.  Background  of  the  Fredkin  Gates 

The  basic  Fredkin  gate  is  defined  as  a  black  box 
having  three  binary  inputs  and  three  binary  outputs 
(Fig.  1 ) .  The  C-input,  the  control  line,  determines  the 
operations  of  the  gate  on  the  other  two  inputs  accord¬ 
ing  to  the  following  rules: 

C-C' 

if  C  » 0:  A' » A;  B‘ » B:  il) 

if  C  *  1:  A' m  8;  B'  *  A. 

It  is  quite  evident  that  this  gate  is  reversible;  i.e.,  it 
may  be  run  backward  to  return  to  the  original  inputs, 
and,  therefore,  it  is  in  principle  nondissipative.  [The 
original  definitions  used  in  Ref.  2  are  the  inverse  of  Eq. 
( 1 );  however,  we  find  this  definition  more  intuitive  and 
more  suitable  for  optical  implementation.] 

Using  the  three  inputs  and  the  three  outputs  of  the 
Fredkin  gate,  one  may  implement  the  traditional  logic 
gates  that  usually  have  two  input  lines  and  one  output 
line.  To  make  the  comparison  easier,  in  the  examples 
of  Fig.  2  we  leave  the  lines  corresponding  to  the  con¬ 
ventional  gates  straight  while  the  other  lines  are  shown 
bent.  In  Fig.  2(a)  an  and  gate  is  implemented  keeping 
the  a  input  at  the  0  level  and  obtaining  the  required 
output  of  the  A'  line.  Unlike  conventional  gates,  we 
obtain  two  additional  outputs  that  we  may  utilize  or 
ignore.  In  a  similar  way,  one  possible  implementation 
of  an  OR  gate  is  shown  in  Fig.  2(b).  It  can  be  easily 
shown  that  any  other  function,  such  as  NOT.  fan  out. 
FAN-lN,  and  FLIP  FLOP9,  is  also  easily  implemented. 
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Fig.  1.  Fredkin  gat*  (see  also  text). 


t 


Fig. 2.  Logic  gates  implemented  by  the  use  of  Fredkin  gates:  (a)an 
and  gate;  (b)  OR  gate. 


£  O 


Fig.  3.  Polarization  switching  gate. 

In  the  next  section  we  discuss  a  number  of  ways  to 
implement  the  Fredkin  gate  by  optical  and  electroop- 
tical  means. 

IN.  Optical  Implementations  o(  the  Fredkin  Gate 

For  applications  in  logic  networks  one  is  usually 
interested  in  logic  gates  containing  nonlinear  bistable 
elements.  The  basic  configuration  of  a  Fredkin  gate, 
however,  is  not  restricted  to  digital  signals,  and,  in 
principle,  one  may  use  these  gates  for  processing  ana¬ 
log  signals  as  well.  In  the  examples  that  follow  the 
nature  of  the  control  signal  will  determine  the  actual 
response  of  the  gate. 

A.  Polarization  Switching  Gate 

A  polarization  switching  gate  is  shown  in  Fig.  3.  The 
a  and  b  lines  correspond  to  two  orthogonal  polariza¬ 
tions  of  a  light  beam  (or  a  waveguide  channel  of  an 
integrated  optical  system)  traversing  an  electrooptic 
modulator  that  rotates  both  polarizations  by  90°  when 
activated.  The  activation  is  induced  by  the  C-line 
either  by  a  direct  electronic  pulse  or,  as  shown  in  the 
figure,  by  an  optical  signal  transduced  to  an  electronic 
signal  using  a  photodetector  (photoconductor  or  pho¬ 
todiode-amplifier  combination).  Polarizing  beam 
splitters  may  be  applied  whenever  a  spatial  separation 


Fig  Ac  juatooptic  gate. 

is  required  between  the  A  and  B  lines.  The  main 
advantage  of  this  gate  is  its  relative  simplicity,  while  its 
disadvantage  is  the  different  nature  of  the  C-line  that 
also  changes  level  during  transition  through  a  gate  (i.e., 
there  is  a  lower  light  intensity  in  C'  than  in  C;  this 
effect  may,  however,  be  corrected  by  incorporating  an 
amplifying  medium  on  the  line). 

B.  Acoustooptic  Gate 

In  Fig.  4  we  show  a  schematic  diagram  of  the  acous¬ 
tooptic  gate:  The  two  input  lines  are  laser  beams 
incident  on  an  acoustooptic  deflector  (either  bulk  or 
integrated  SAW)  at  the  Bragg  angle.  If  there  is  no 
acoustic  signal  (C  ■"  0),  the  two  beams  continue  unaf¬ 
fected  (A'  and  B'),  while  if  C  is  present  each  beam  is 
deflected  into  the  other  channel.  This  is  also  a  simple 
gate,  but  here  too  one  has  a  C-line  which  is  basically 
different  in  nature  than  the  other  two  lines.  Never¬ 
theless,  this  kind  of  gate  can  be  easily  cascaded  and 
integrated.  For  example,  a  single  acoustic  pulse  may 
activate  many  gates  as  it  travels  along  the  system. 

Of  course,  any  100%  efficient  gateable  diffractor  will 
suffice.  Such  devices  are  possible  in  integrated  optics. 

C.  Photorefractfve  Gate 

The  photorefractive  gate  based  on  four-wave  mixing 
is  an  all  optical  gate  with  one  of  its  tentative  implemen¬ 
tations  illustrated  in  Fig.  5.  In  this  case  the  C-line 
constitutes  the  two  pump  beams.  The  inputs  A  and  B 
are  transmitted  if  C  is  absent  and  phase-conjugated 
when  the  pump  is  present  resulting  in  switching  be¬ 
tween  the  outputs. 

0.  Waveguide  or  Coupler 

In  optical  communication  and  integrated  optical 
systems  a  modulated  waveguide  or  fiber  coupler  may 
serve  as  a  Fredkin  gate.  Two  general  classes  of  this 
kind  of  gate  may  be  implemented.  The  out-of-plane 
control  is  shown  schematically  in  Fig.  6(a)  and  the 
in-plane  control  with  one  possibility  depicted  is  Fig. 
6(b).  A  number  of  workers  have  already  implemented 
the  electronically  addressed  coupler4-5  that  may  serve 
as  a  Fredkin  gate  with  an  electronic  C-input.  To  sym¬ 
metrize  the  system  one  may  use  photodetection  com¬ 
bined  with  the  electrooptic  coupler  to  facilitate  optical 
control.  A  more  advanced  technology  would  be  the 
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Fig.  5.  Photorefractive  gate  using  four- wave  mixing  in  photorefrac- 
tive  material  (P.R).  Beam  splitters  ( B.S )  are  needed  for  output 
coupling. 


A 


Fig.  6.  Waveguide  coupler  gate.  The  coupling  region  activated  by 
line  C  ia  a  photorefractive  or  other  nonlinear  material  or  electrooptic 
material:  (a)  out-of-plane  control;  (b)  in-plane  control. 

use  of  photorefractive  material  for  direct  optical  con¬ 
trol  of  the  coupling  constant.  The  example  in  Fig.  6(b) 
is  a  waveguide  coupler  incorporating  highly  anisotro¬ 
pic  guides  containing  nonlinear  material.  The  two 
coupling  waves  ( A  and  B)  are  ‘ntroduced  with  the  same 
polarization  so  that  they  can  couple  while  the  control 
signal  C  is  orthogonally  polarized  so  that  its  power  is 
used  to  activate  the  coupling  between  the  A  and  B 
channels,  but  it  does  not  couple  itself  into  the  other 
guide.8 


IV.  PropoMd  Devices  Incorporating  Fredkln  Gates 

We  demonstrate  the  applicability  of  these  new  gates 
by  proposing,  in  addition  to  the  conventional  logic 
gates,  two  very  useful  devices  that  incorporate  arrays 
of  the  waveguide  gates  shown  in  Fig.  6. 

A.  Optical  Crossbar 

The  gate  array  of  Fig.  7  may  be  constructed  of  gates 
of  the  type  depicted  in  Fig.  6(a)  or  the  type  in  Fig.  6(b) 
In  the  first  case  each  gate  may  be  accessed  randomly 
from  above  by  an  electric  field  or  by  light,  depending 
on  the  specific  device  used.  As  we  are  dealing  with 
optical  computing  we  might  prefer  activation  by  light 
such  as  a  holographic  coupler8  or  fiber  coupler.  With 
proper  addressing  each  input  line  can  be  coupled  tc 
each  output  line.  This  system  may  prove  to  be  ar 
extremely  fast  and  efficient  crossbar  or  optical  switch 
board.  The  in-plane  addressing  of  Fig.  6(b)  is  applica 
ble  if  one  desires  to  activate  a  whole  column  together 
At  first  sight  it  appears  that  this  kind  of  addressing  i: 
not  suitable  for  random  access;  however,  with  very  fas 
pulses  this  also  becomes  feasible. 

B.  Tapped  Delay  Line 

The  basic  configuration  of  Fig.  8(a)  is  a  tapped  dela> 
line.  A  fiber  ring  may  be  utilized  for  long  delays,  whilt 
for  very  short  delays  one  may  use  waveguide  rings,  tht 
feasibility  of  which  has  also  been  demonstrated.91 
Here  too  the  addressing  may  be  of  the  first  type  [Fig 
6(a)]  or  of  the  second  type  [Fig.  6(b)].  Such  a  setuf 
may  be  used  to  delay  all  the  energy  in  a  pulse  or  jus 
part  of  it  to  produce  a  pulse  train  from  a  single  initia 
pulse.  A  slight  modification  of  the  system  as  illustrat 
ed  in  Fig.  8(b)  may  be  used  to  reverse  the  direction  o 
signal  flow  resulting  in  a  true  reversible  Fredkin  gate 
In  the  future,  an  optical  memory  block  may  resembl 
the  array  depicted  in  Fig.  8(c).  This  seems  to  be  , 
short-term  memory,  but  with  the  integration  of  ampli 
fying  medium  it  may  serve  also  as  a  long-term  memory 

V.  Discussion 

Initial  approaches  to  optical  computing  have  tender 
to  duplicate  the  evolution  of  combinatorial  logic  im 
plemented  in  semiconductor  microelectronics 
Present  configurations  of  semiconductor  logic  gate 
are  well  suited  for  electronic  computing  but  may  not  b< 
the  best  choice  for  optically  implemented  logic.  Ii 


Fig.  7.  Integrated  optical  croaaba r.  The  elliptic  regions  are  the  nonlinear  coupling  switches. 
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(c) 


Fig.  8.  Tapped  delay  Use: 

(a)  baaic  configuration; 

(b)  reversing  modification; 

(c)  memory  array. 


this  work  we  have  resurrected  the  concept  of  reversible 
logic,  illustrated  various  optical  implementations  of 
Fredkin  gates,  and  suggested  gate  configurations  capa¬ 
ble  of  combinatorial  logic.  These  configurations  com¬ 
bine  the  communications  advantage  of  optics  with 
noncapacitive  multiline  addressing  of  individual  gates 
and  suggest  their  evaluation  as  a  basic  building  block 
for  optical  computers.  The  various  implementations 
illustrated  here  are  intended  to  illustrate  the  potential 
of  this  approach;  future  work  will  elaborate  on  specific 
higher-order  logical  functions. 

Joseph  Shamir  is  on  leave  from  the  Department  of 
Electrical  Engineering,  Technion — Israel  Institute  of 
Technology. 
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Abstract 

Much  work  Is  being  done  toward  the  optical  implementation  of  traditional  electronic  processing  and 
computing  methods.  Many  of  the  proposed  methods  may  not  be  the  optimal  way  to  utilize  the  benefits  of  opti¬ 
cal  techniques.  We  introduce  here  a  new  optical  gate  -  the  Fredkin  gate  -  that  is  in  principle  minimally 
dissipative  and  its  response  time  in  some  implementations  may  be  limited  only  by  the  duration  of  optical 
pulses  (i.e.  in  the  subpicosecond  range).  To  indicate  the  viability  of  this  novel  approach,  a  number  of 
optical  Implementations  of  Fredkin  gates  with  some  Interesting  applications  are  proposed. 

Introduction 

One  of  the  limitations  imposed  on  increasing  computation  power,  be  it  electronic  or  optic,  stems  from 
the  large  amount  of  energy  that  needs  to  be  dissipated  during  computer  operation^.  Part  of  this  energy  is 
due  to  the  intrinsic  nature  of  the  traditional  logic  elements.  This  fact  becomes  evident  if  we  recall  that 
a  conventional  logic  gate  has  more  input  lines  than  output  lines.  Thus  some  of  the  Information  coming  into 
the  gate  is  lost  and  cannot  be  retrieved.  The  Irreversible  nature  of  the  gate  makes  it  dissipative  not  only 
in  Information  but  also  in  energy.  In  an  effort  to  overcome  these  limitations,  Fredkin  proposed  a  new  kind 
of  logic  gate  which  has  the  same  number  of  output  lines  as  it  has  input  lines.  Fredkin  gates  are  capable 
of  performing  conventional  logic  operations  while  preserving  all  the  original  information.  In  contrast  to 
the  conventional  logic  gates  the  Fredkin  gates  may,  in  principle,  be  run  backwards  to  regenerate  the  origi¬ 
nal  input  signals. 

The  purpose  of  this  work  Is  to  Introduce  the  optical  Fredkin  gate  which  may  become  one  of  the  basic 
building  blocks  of  an  optical  computer.  An  overview  of  the  main  aspects  of  the  Fredkin  gate  is  given  in 
the  next  section,  followed  by  a  variety  of  proposed  optical  implementations.  A  number  of  use  applications 
are  discussed  in  a  final  section. 

Background  on  the  Fredkin  Gate 

The  basic  Fredkin  gate  is  defined  as  a  black  box  having  three  binary  inputs  and  three  binary  outputs 
(Figure  1).  The  C-input  -  the  control  line,  determines  the  operation  of  the  gate  on  the  other  two  inputs 
according  the  following  rules: 


IF  C  »  0:  A1  -  A;  8’  «  B;  (1) 

IF  C  *  1:  A'  -  B;  B'  -  A; 


C 

A 

B 


C' 

A' 

B' 


Figure  1 . 


innuti^nd  thj*  th]S  3*5*  is  r*'fersib1e.  i.e.  It  may  be  run  backward  to  return  to  the  original 

5  Fn  m  V  "  pIlnS1Sl?  n°"-?1ssipative.  (The  original  definitions  used  in  Ref.  2  are  the 
t1on  j  <!•().  owever  we  find  this  definition  more  intuitive  and  more  suitable  for  optical  implementa- 
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Using  the  three  inputs  and  the  three  outputs  of  the  Fredkln  gate  one  may  Implement  the  traditional 
logic  gates  that  usually  have  two  input  lines  and  one  output  line.  To  make  the  comparison  easier,  in  the 
examples  of  Fig.  2  we  leave  the  lines  corresponding  to  the  conventional  gates  straight  while  the  other  lines 
are  shown  bent.  In  Figure  2a  an  AND  gate  Is  implemented  keeping  the  A  input  at  the  0  level  obtaining  the 
required  output  on  the  A;  line.  Unlike  conventional  gates,  we  obtain  two  additional  outputs  that  we  may 


0 


alb  a 


a  +  b  (b) 


a  +  b  a 


Figure  2. 


utilize  or  ignore.  In  a  similar  way,  one  possible  implementation  of  an  OR  gate  is  shown  In  Figure  2b.  It 
can  be  easily  shown  that  any  other  function,  such  as  NOT,  FAN-OUT,  FAN-IN  and  FLIP-FLOPs  are  also  easily 
implemented.  In  the  next  section  we  discuss  a  number  of  ways  to  Implement  the  Fredkin  gate  by  optical  and 
electro-optical  moans. 


Optical  Implementations  of  the  Fredkin  Gate 


For  applications  in  logic  networks  one  is  usually  Interested  in  logic  gates  containing  nonlinear, 
bistable  elements.  The  basic  configuration  of  a  Fredkin  gate,  however.  Is  not  restricted  to  digital  signals 
and,  in  principle,  one  >nay  use  these  gates  for  processing  analog  signals  as  well.  In  the  examples  that 
follow  the  nature  of  the  control  signal  will  determine  the  actual  response  of  the  gate. 


A  polarization  switching  gate  Is  shown  In  Figure  3.  The  A  and  B  lines  correspond 
polarizations  of  a  light  beam  (or  a  waveguide  channel  of  an  Integrity  optical  system) 


to  two  orthogonal 
traversing 


t  n 

lQt 

~ — r0 

0  | 

B1  A1 

A  B 

E.O. 

Figure  3. 


an  electro-optic  modulator  that  rotates  both  polarizations  by  90®  when  activated.  The  activation  is  Induced 
by  a  direct  electronic  pulse  or,  as  shown  in  the  figure,  by  an  optical  signal  transduced  to  an  electrical 
signal  using  a  photodetector  (Photoconductor  or  photodiode-amplifier  combination).  Polarizing  beam-splitters 
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may  be  applied  whenever  a  spatial  separation  is  required  between  the  A  and  B  lines.  The  main  advantage  of 
this  gate  is  its  relative  simplicity  while  its  disadvantage  is  the  different  nature  of  the  C-line  that 
also  changes  level  during  transition  through  a  gate  (i.e.  there  is  a  lower  light  intensity  in  C‘  than  in  C. 

This  effect  may,  however,  be  corrected  by  incorporating  an  amplifying  medium  on  the  line). 

In  Figure  4  we  show  a  schematic  diagram  of  the  acousto-optic  gate:  The  two  input  lines  are  laser  beams 
incident  on  an  acousto-optic  deflector  (either  bulk  or  integrated  SAW )  at  the  Bragg  angle.  If  there  is  no 
acoustic  signal  (C  =  C),  the  two  beams  continue  unaffected  (A'  and  B * )  while  if  C  is  present  each  beam  is 
deflected  into  the  other  channel.  This  is  also  a  simple  gate  but,  here  too,  one  has  a  C  line  which  is  basically 
basically  different  in  nature  than  the  other  two  lines.  Nevertheless  this  kind  of  gate  can  be  easily  cas¬ 
caded  and  integrated.  For  example,  a  single  acoustic  pulse  may  activate  many  gates  as  it  travels  along  the 
system. 


C' 


Figure  4. 


The  °hotorefractive  gate,  based  on  four-wave-mixing3  is  an  all  optical  gate  with  one  of  its  tentative 
implement.! nons  illustrated  in  Figure  5.  In  this  case  the  C-line  constitutes  the  two  pump  beams.  The 
inputs  A  and  are  transmitted  if  C  is  absent  and  phase-conjugated  whe*  the  nump  is  present  resulting  in  a 
switching  between  tiie  outputs. 


B' 


Firn'-e  5. 


In  optical  connunication  and  Integrated  optical  systems  a  modulated  waveguide  or  fiber  coupler  may  serve 
as  a  Fredkin  qate.  Two  general  classes  of  this  kind  of  gates  may  be  implemented.  TFe  out-of-piane  control, 
shown  schematically  in  Figure  6a,  and  the  inplane  control  with  one  possibility  depicted  in  Figure.  Go.  A 
number  of  workers  have  already  implemented  the  electronically  addressed  coupler  *•*  that  may  serve  as  a  Fred- 
kin  gate  with  an  electronic  C-input.  To  synmetrize  the  system  one  may  use  photodetection  combined  with 
the  electro-optic  coupler  to  facilitate  optical  control.  A  more  advanced  technology  would  be  the  use  of 
photorefractive  material  for  direct  optical  control  of  the  coupling  constant.  The  example  in  (b)  is  a  wave¬ 
guide  coupler  Incorporating  highly  anisotropic  guides  containing  nonlinear  material-  The  two  coupling  waves 
(A  and  B)  are  introduced  with  the  same  polarization  so  that  they  can  couple  while  the  control  signal,  C,  is 
orthogonally  polarized  in  such  a  way  that  its  power  is  used  to  activate  the  coupling  between  the  A  and  B 
channels  but  it  does  not  couple  Itself  into  the  other  guide6. 
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Proposed  Devices  Incorporating  Fredkln  ^ates 


We  demonstrate  the  applicability  of  these  new  gates  by  proposlnq,  in  addition  to  the  conventional  logic 
gates,  two  very  useful  devices  that  incorporate  arrays  of  the  waveguide  oates  shown  in  Figure  6. 


Figure  6. 


The  optical  crossbar.  The  gate  array  of  Figure  7  may  be  constructed  of  gates  of  the  type  depicted  in 
F1gure“5a  or  the  type  of  5b.  In  the  first  case  each  gate  may  be  accessed  randomly  from  above  by  an  electric 
field  or  by  light,  depending  on  the  specific  device  used.  As  we  are  dealing  with  optical  computing  we  might 
prefer  activation  by  light  such  as  a  holographic  coupler®  or  fiber  coupler.  With  proper  addressing  each 
input  line.  This  system  may  prove  to  be  an  extremely  fast  and  efficient  crossbar  or  optical  switchboard. 

The  in-place  addressing  of  Figure  6b  is  applicable  if  one  desires  to  activate  a  whole  column  together.  At 
first  sight  It  appears  that  this  kind  of  addressing  is  not  suitable  for  random  access;  however  with  very 
fast  pulses  this  also  becomes  feasible. 


The  tapped  delay  line.  The  basic  configuration  of  Figure  8a  is  a  tapped  delay  line.  A  fiber  ring 
may  be  utTTTzeU  ror  Tong  aelays  while  for  very  short  delays  one  may  use  waveguide  rings  the  feasibility  of 
which  has  also  been  demonstrated^*^.  Here  too,  the  addressing  may  be  of  the  first  type  (Figure  6a  or  of 
the  second  (Figure  6b).  Such  a  set  may  be  used  to  delay  all  the  energy  in  a  pulse  or  just  part  of  it  to 
produce  a  pulse  train  from  a  sing! itial  pulse.  A  slight  modification  of  the  system  as  illustrated  in 
Figure  8b  may  be  used  to  reverse  the  direction  of  signal  flow  resulting  in  a  true  reversible  Fredkin  gate. 

In  the  future,  an  optical  memory  block  may  resemble  the  array  depicted  In  Figure  8c.  This  seems  to  be  a  short 
term  memory,  but  with  the  integration  of  amplifying  medium  it  may  serve  also  as  a  long-term  memory. 
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Figure  8(a) 


Figure  8  (b) 


Figure  8(c) 


Discussion 

Conventional  approaches  to  optical  computing  followed  the  lines  nut  forward  by  workers  with  electronic 
systems.  Traditional  logic  gates  are  well  suited  for  electronic  computing  but  may  not  be  the  best  choice 
for  optical  Processors.  In  this  work  we  Indicated  that  one  should  also  consider  different  Imnlenentations 
for  optical  comoutlng  systems  with  one  very  promising  possibility  beina  the  Fredkin  gate.  These  aates  have 
many  simple  optical  Implementations  and  may  prove  to  be  very  fast  and  energy  efficient.  The  various  Imple¬ 
mentations  and  applications  given  here  are  just  samples  to  Indicate  the  diverse  possibilities  available. 

References 

1.  R.  Landauer,  “Irreversibility  and  heat  generation  in  the  computing  process"  IBM  J.  Res.  Vol .  5,  pp. 
183-191,  (1961), 

2.  E.  Fredkin  and  T.  Toffoli,  "Conservative  logic"  Int.  J.  Theor.  Phys.,  Vol.  21,  pp.  219-253,  (1982). 

3.  See  for  example,  "Optical  phase  conjugation'^.  A.  Fisher  ed.  Academic  Press,  N.  Y.  1983. 

4.  C.  S.  Tsai,  8.  Kim  and  F.  R.  El-Akkari,  "Optical  channel  waveguide  switch  and  coupler  using  total 
Internal  reflection"  IEEE  J.  QE-14,  pp.  513-517,  (1978). 

5.  C.  1.  Chang  and  C.  S.  Tsai,  "GHz  bandwidth  optical  channel  waveguide  TIR  switches  and  4x4  switching 
networks"  Topical  Meeting  on  Integrated  and  Gulded-Wave  Optics  Jan.  6-8,  1982,  Pacific  Grove,  Cal. 

6  ^  S8I£  Vol  626  Optical  Computing  (1986) 


6.  R.  A.  Forber  and  E.  Marom.  "Optimization  of  syrmetric  zero-gap  dielectric  couplers  for  large 
switching-array  applications"  CLEO,  April  1985,  Baltimore. 

7.  A.  Lattes,  H.  A.  Haus,  F.  T.  Leonberger,  and  E.  P.  Ippen,  IEEE  J.  QE-19,  1718-  (1983). 

8.  J.  W.  Goodman,  F.  I.  Leonberger,  S.  V.  Kung  and  R.  A.  Athale  "Optical  interconnections  for  VLSI" 
Proc.  IEEE  Vol.  72,  pp.  850-866,  (1984) 

g.  j.  Haavisto  and  G.  A.  Pajer,  "Resonance  effects  in  law-loss  ring  waveguides"  Opt.  Lett.  Vol.  5,  pp 
510-512,  (1980). 

10.  A.  Mahapatra,  and  W.  C.  Robinson,  "Integrated-optic  ring  resonators  made  by  proton  exchange  in 
lithium  niobate"  Appl .  Opt.  Vol.  22,  pp.  2285-2286,  (1985). 


SPie  Vol  625  Opticol  Computing  (1 966)  /  7 


High-efficiency  rapidly  programmable  optical 
interconnections 


Joseph  Shamir  and  H.  John  Caulfield 


An  array  of  optical  Fredkin  gates  implemented  by  optically  controlled  waveguide  couplers  is  showr 
constitute  a  very  efficient  and  versatile  optical  interconnection  network  with  parallel  addressing  capabilit 
The  characteristics  of  the  array  are  analyzed  using  linear  algebra  to  indicate  interconnect  programs 
procedures.  In  terms  of  SNR  this  network  is  estimated  to  be  comparable  with  previously  proposed  archi 
tures.  However,  from  many  other  aspects  (light  transmission  efficiency,  number  of  switching  eleme 
speed,  and  fault  tolerance)  it  has  significant  advantages. 


(■  -  » - *  -  — «* - 

B^rOCTiwsKMi 

Optical  interconnects  were  initially  investigated  for 
application  in  integrated  electronic  processors.  '■*  The 
demand  for  highly  efficient  and  fast  optical  intercon¬ 
nects  or  programmable  crossbars  is  now  increasing 
with  the  extensive  progress  made  in  the  applications  of 
optical  fiber  communication  networks  and  the  expect¬ 
ed  developments  in  optical  computing.  In  a  recent 
work6  the  benefits  of  the  optical  Fredkin  gate  were 
discussed,  and  several  optical  implementations  were 
proposed.  It  was  also  pointed  out  there  that  an  array 
of  these  gates  may  function  as  an  optically  or  electroni¬ 
cally  addressed  optical  interconnection  network.  The 
array  of  switching  elements  building  up  this  network 
may  be  addressed  in  parallel  leading  to  a  very  fast, 
light-efficient,  and  fully  programmable  device.  In 
principle,  the  operating  speed  of  the  network  will  be 
limited  by  the  addressing  time,  and  that  may  be  very 
short  if  a  page-oriented  holographic  memory7  8  is  em¬ 
ployed.  In  such  a  memory  bank  each  useful  switching 
pattern  is  stored  as  a  hologram  that  may  be  addressed 
by  a  deflected  laser  beam.910  Nanosecond  addressing 
time  may  be  possible  with  an  array  of  1024  X  1024 
holograms. 

In  the  present  work  we  analyze  the  operation  of  a 
general  Fredkin  gate  array  interconnection  network. 
Although  any  optical  implementation  of  the  Fredkin 
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gate  may  be  assembled  as  a  useful  array,  reference  h 
will  be  made  to  the  most  promising  one,  the  optict 
addressed  waveguide  coupler  array. 

Most  of  the  presently  demonstrated  9witcha 
waveguide  couplers  employ  the  electrooptic  effect 
voltage  control,11'14  but  direct  light-addressa 
switches  are  already  emerging.1516  For  our  purp 
we  are  interested  in  direct  light  activated  couplt 
However,  a  photodetector  array  connected  to  an  el 
trooptic  switching  array  will  be  also  quite  efficient  w 
its  speed  limited  only  by  detector  delays.17 

In  the  next  section  we  describe  the  architecture 
the  Fredkin  gate  network  with  a  physical  approach 
the  addressing  algorithm.  The  ideal  gate  array  is 
scribed  in  Sec.  Ill  by  a  linear  algebraic  approach  c 
ploying  a  unitary  matrix  group.  The  physical  limi 
tions  of  a  real  system  is  discussed  in  Sec.  IV  taking  i 
account  losses  and  crosstalk  to  evaluate  an  expec 
SNR  for  an  actual  device.  The  implementation 
optical  crossbars  is  addressed  in  Sec.  V  with  a  gent 
discussion  following  in  Sec.  VI. 

H.  Fredkin  Gate  Interconnection  Network 

The  basic  Fredkin  gate  is  defined  as  a  black  I 
having  three  binary  inputs  and  three  binary  outp 
(Fig.  1).  The  C-input,  control  line,  determines  opt 
tion  of  the  gate  on  the  other  two  inputs  according  to 
following  rules: 

c-c\ 

ifC-0:  A  -A;  B' m  B; 
if  C  ■  X:  A'  »  B;  B‘  »  A. 

It  is  evident  that  this  gate  is  reversible;  i.e.,  it  ma> 
run  backward  to  return  to  the  original  inputs,  a 
therefore,  it  can  be  nondissipative,  at  least  in  princij 
Of  the  various  optical  implementations  proposed 
Ref.  6  we  are  interested  here  in  the  waveguide  couf 
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A  -  -  V 


Fig.  1.  Fred  kin  gate. 


A' 


B' 


.  Waveguide  coupler  implementation  of  the  Fredkin  gate.  / 
he  interaction  region  where  coupling  is  switched  ON  or  OFF. 


AYER  NO.  1 


J  4 

I  1 

I  i 


Fig.  3.  Four-channel  array  with  four  switching  layers. 


shown  in  Fig.  2,  although  any  other  optical  Fredkin 
gate  is  applicable.  The  two  inputs,  A  and  B,  are 
switched  when  the  interaction  region  /  is  activated  by 
the  control  signal  C.  The  most  efficient  construction 
would  involve  a  photorefractive  interaction  region  di¬ 
rectly  activated  by  light.  However,  the  electrooptic 
effect  may  also  be  used  employing  an  amplified  signal 
from  a  photodetector  receiving  the  C-input. 

The  waveguide  coupler  Fredkin  gate  of  Fig.  2  is  our 
basic  building  block  for  constructing  a  general  inter¬ 
connection  network.  Figure  3  represents  the  4-input 
and  4-output  network.  Proceeding  from  left  to  right 
we  encounter  four  layers  of  interaction  regions  (num¬ 
bered  1-4)  with  each  such  region  activated  by  an  inci¬ 
dent  control  signal  Checking  all  possible  switching 
combinations  one  can  show  that  with  this  arrangement 
any  input  signal  a*  (i  *  1,2,3, 4)  may  be  coupled  into  any 
output  port  6,.  In  other  words,  all  twenty-four  permu¬ 
tations  are  possible  with  four  layers  of  switches,  six 
switches  all  together.  It  is  interesting  to  note  that 
there  are  forty  possible  switching  states.  Thus  some 
of  them  are  redundant  with  respect  to  the  output  con¬ 
figuration.  As  will  be  indicated,  this  redundancy  is 
very  useful  for  fault  tolerant  operation. 

Using  induction  one  may  generalize  the  configura¬ 
tion  assuming  that  for  rt  *  2 N  channels  one  needs  n 
interaction  layers.  If  we  add  two  more  input  channels, 
a„+i  and  an+2,  as  in  Fig.  4,  we  need  two  more  couplers 
(the  dotted  ones  in  the  figure)  to  switch  either  of  the 
two  new  signals  into  the  old  array.  To  make  all  permu¬ 
tations  possible  the  additional  layers  should  be  filled 
out  completely  as  will  be  indicated  in  the  mathemati¬ 
cal  description  of  the  next  section.  We  see  that  our 
nXn  network  needs  n  layers  with  alternating  n/2  and 
n/2  —  1  switches  each.  Thus  the  complete  array  needs 
n(n  —  l)/2  switches  to  establish  all  possible  intercon¬ 
nections,  that  is,  less  than  half  of  the  n2  elements 
required  by  most  conventional  networks. 

This  whole  switching  array  may  be  considered  as  a 
generalized  n-dimensional  Fredkin  gate:  If  all  control 
inputs  are  in  the  0  state  (all  switching  elements  are 


Fig.  4.  n -channel  array  with  the 
addition  of  two  more. 
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OFF)  we  obtain  b,  «  a,  for  all  (,  while  a  complete  inver¬ 
sion,  i.e.,  b„  ”  oi,  6„-i  ■  a2,  etc.,  is  obtained  with  all 
control  signals  in  the  1  state  (all  switches  are  ON). 

■.  Mathematical  Analysis 

For  a  mathematical  analysis  we  return  to  our  basic 
element,  the  waveguide  coupler  Fredkin  gate  of  Fig.  2, 
and  represent  the  input  and  output  channels  by  vec¬ 
tors  a  and  b,  respectively.  The  transformation  of  the 
vector  a  into  the  vector  b  may  be  implemented  by  a  2  x 
2  unitary  matrix  F(C),  where  the  parameter  C  may 
assume  the  two  control  values  0  and  1: 

™-[o  5  !]•  ® 

The  two  possible  transformations  attainable  with  this 
device  may  thus  be  written  in  the  matrix  form: 

b  »  F(C)a.  (3) 

This  matrix  formalism  is  easily  extendable  to  a  general 
n-channel  device:  We  observe  that  each  interaction 
region  in  a  gate  layer  (see  Fig.  3)  involves  only  switch¬ 
ing  between  adjacent  channels.  Thus,  if  we  describe 
the  input  to  this  layer  by  the  n-element  vector  a,  it  will 
be  transformed  by  a  block -diagonal  unitary  matrix 
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(4) 

where  P  corresponds  to  an  odd  numbered  layer,  while 
Q  is  a  matrix  corresponding  to  an  even  numbered  layer. 
Thus  one  may  describe  the  complete  system  transfor¬ 
mation  by  a  product  of  these  matrices.  The  first  ma¬ 
trix  to  operate  on  the  input  vector  will  be  Pi,  the 
second  matrix  will  be  Q2,  and  the  final  one  will  be  Qn, 
and  we  may  write  down  the  complete  transformation 
to  be  the  relation 

b  -  T»,  (5) 

where  the  transfer  matrix  is 


T  -  .  .Q2P,. 

where  the  subscripts  denote  layer  numbers. 

Equation  (6)  is  a  simple  mathematical  relation  th 
describes  any  possible  switching  state  of  the  compie 
system.  To  solve  the  inverse  problem,  i.e.,  determl 
the  switching  state  for  a  given  interconnection,  tl 
requirement  is  not  much  more  complicated.  Since  tl 
n  x  n  matrix  T  must  have  a  single  element  with  value 
in  each  row  and  each  column  with  the  rest  of  tl 
elements  having  the  value  0,  it  is  a  simple  matter 
write  down  this  matrix  for  any  interconnection  r 
quired.  The  next  step  is  a  decomposition  of  this  m 
trix  into  n  matrices  P  and  Q.  This  can  be  done  easil 
since  these  matrices  are  the  inverse  of  themselvc 
Thus  one  can  take  the  T  matrix  and  start  multiplyii 
it  by  the  P  and  Q  matrices  until  it  reduces  to  the  un 
matrix.  If  we  multiply  Eq.  (6)  from  the  left  by  Q„  th 
matrix  is  eliminated  from  the  right-hand  side  of  tl 
equation.  The  procedure  may  now  go  on  with  tl 
matrix  Pn-i  etc.  until  the  unit  matrix  is  obtained: 

P,Q2---P„-1Q*T-I.  (6 

where  I  is  the  unit  matrix.  Physically,  the  two  equ; 
tions  (6)  demonstrate  the  reciprocity  property  of 
nondissipative  optical  system. 

We  clarify  the  procedure  using  the  six -channel  am 
illustrated  in  Fig.  5  with  an  arbitrarily  chosen  intercoi 
nection  pattern  indicated  on  the  right-hand  side.  1 
write  down  the  transfer  matrix  we  observe  that  03  miu 
be  transferred  to  the  first  channel.  Thus  the  first  ro 
should  have  its  unit  element  in  the  third  place.  Sim 
larily,  the  second  row  will  have  its  unit  element  in  th 
second  column  and  so  on  until  we  construct  the  who! 
matrix: 

'001000" 
010000 
000001 
*100000' 
000100 
_£>  0  0  0  1  0_ 

Our  objective  now  is  to  find  consecutive  P  and  Q  matr: 
ces  that  will  translate  all  the  unit  elements  into  th 
matrix  diagonal.  To  be  most  efficient  in  this  proce 
dure  we  observe  that  the  units  in  rows  3  and  4  are  at  th 
largest  distance  from  the  diagonal,  and  they  can  bot 
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Fig.  5.  Si*-channel  array  with 
selected  interconnections 


>  brought  closer  by  interchanging  them.  This  goal 
ay  be  attained  by  the  P  matrix  (a  Q  matrix  will  do  no 
od  at  this  stage): 


Ps 


0  1  0  0  0  0 

1  0  0  0  0  0 

0  0  0  1  0  0 

0  0  1  0  0  0 

0  0  0  0  1  0 

J  0  0  0  0  1_ 


here  we  also  switched  between  the  first  two  channels 
nee  the  unit  of  the  first  row  is  also  far  from  the 
agonaL  Proceeding  in  this  manner,  we  have 
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hese  three  matrices  complete  the  task.  Thus,  for  this 
ise,  only  three  layers  are  required  to  perform  the 
aeration.  This  again  can  be  deduced  by  observing 
lat  the  transfer  matrix  has  its  unit  elements  at  a 
istance  from  the  diagonal  not  exceeding  three  posi- 
ona.  The  hatched  interaction  layers  in  Fig.  5  desig- 
ate  the  ON  elements.  This  specific  example  demon¬ 
rated  also  the  property  of  redundancy  that  may  lead 
>  fault  tolerance  when  production  limitations  are  con* 
dered. 

All  the  matrices  involved  until  now  are  unitary  ma- 
•ices  as  we  are  dealing  with  ideal  nondissipative  sys- 
jms.  In  the  next  section  we  modify  the  formalism  to 
iclude  losses  and  leaky  switching  elements  as  encoun- 
jred  in  practice. 


f.  Real  Networks 

A  real  physical  network  cannot  be  described  by  the 
bove  unitary  matrices.  To  take  into  account  losses 
nd  crosstalk  in  the  nonideal  switching  elements,  the 
asic  switching  matrix  of  Eq.  (2)  should  be  modified, 
lie  two  states  of  a  reed  Fredkin  gate  may  thus  be 
epresented  by  the  two  modified  matrices: 


/here  a  is  the  loss  from  the  unswitched  channel  in* 
hiding  actual  loss  and  leakage  0  into  the  second  chan- 
>el,  and  4  is  the  uncoupled  fraction  into  the  switched 
hannel  with  7  the  fraction  of  the  signal  that  leaks 
hrough  undeflected.  For  simplicity  a  complete  sym- 
aetry  is  assumed  between  the  two  coupled  channels, 
''or  a  working  system  one  naturally  must  require  that 
c,  0, 7,  4, «  1.  Integrating  this  gate  into  an  intercon* 
tection  array  returns  us  to  the  block-diagonal  matrices 
>f  Eq.  (4),  but  now  they  are  not  unitary  as  they  include 
he  lossy  matrices  [Eq.  (8)j  instead  of  the  ideal  ones  of 
5q.  (2). 

To  investigate  the  effects  of  the  deteriorating  pa- 
ameters  we  return  to  the  four-channel  system  of  Fig.  4 
md  construct  the  transformation  matrix  for  one  of  the 


most  difficult  transformations,  i.e.,  a  complete  inver¬ 
sion  with  input  vector, 

•+  - 11,1.1.0). 

For  this  transformation  all  switching  elements  are  in 
the  ON  state.  Thus  we  have  to  substitute  F(l)  for  all 
the  diagonal  blocks  in  four  matrices  of  the  form  of  Eq. 
(4).  Performing  the  matrix  multiplications  and  oper¬ 
ating  on  the  above  input  vector  yield  the  output  vector 

'7(7  +  (1  -  i)2]  +  7(1  -  «)[2 7  -Ml  -  i)2  +  (1  -  SKI  +  27>f 
bm  7*  +  7(1  “  SKI  +  7)(2  -  S)  +  273(1  -  S)  +  (1  S)3 

74  +  7(1  -  s)2(2  +  7)  +  (1  -  S)[273  +  (1  -  S)2! 

_(1  -  S)3  +  7(1  -  S)J( 27  +  1)  +  7(1  -  S)|(l  -  S)2  +  72  +  7)  . 

(9) 

The  ideal  transformation  would  give  61  *  0  with  the 
other  three  elements  1 .  Thus  we  may  define  a  SNR  by 
the  relation  64/61  giving 

SNR  _  (1  -  S)3  +  7(1  -  S)2(27  +  1)  +  7(1  -  s)[(l  -  S)2  +  72  +  7] 

“  7(7  +  (1  -  S)2]  +  7(1  -  S>(27  +  (1  ~  S)2  +  (1  -  SKI  +  27)] 

(10) 


Retaining  only  first-order  terms  we  obtain 


This  result  could  be  anticipated  since  there  are  three 
switching  elements  where  a  fraction  of  the  unit  signal 
could  leak  into  the  zero  channel,  while  the  losses  from 
the  unit  carrying  channels  are  compensated  to  first 
order  by  leakage  from  the  other  large-signal  channels. 
Again,  by  induction,  one  may  generalize  thiB  first- 
order  approximation  to  n  channels  leading  to  an  ex¬ 
pected  SNR  for  a  physical  network  given  by 

SNR  -  -l  •  (12) 

(n  -  1)7 

Interpreting  some  experimental  results12'14  one  may 
assume  the  attainable  values,  1  -  4  ■  0.95  and  7  * 
0.001  yielding  an  SNR  (>2)  up  to  500  channels. 

V.  Optical  Fredkin  Gate  Crossbar 

-The  major  function  performed  by  the  optical  net¬ 
works  described  in  this  work  is  that  of  a  cross-connec¬ 
tor,  i.e.,  the  capability  to  connect  any  input  channel  to 
any  output  channel.  In  previously  proposed  optical 
crossbars  the  light  input  to  each  channel  is  spread  over 
all  the  output  channels,  and  the  required  connections 
are  obtained  by  blocking  the  unwanted  connections. 
From  the  point  of  view  of  the  optical  design  engineer 
these  are  blocking  crossbars  that,  for  an  n -channel 
system,  are  only  1/n  as  light  efficient  as  our  nonblock¬ 
ing  network,  where,  in  an  ideal  device,  all  the  incident 
light  is  utilized  for  signal  transmission.  Also,  as  point¬ 
ed  out  earlier,  n(n  -  l)/2  switching  elements  are  ade¬ 
quate  to  perform  all  interconnections  as  opposed  to  n2 
elements  in  the  previous  optical  crossbars.  However, 
our  interconnection  network  is  not  completely  equiva¬ 
lent  to  a  crossbar. 

From  the  point  of  view  of  the  network  engineer18 
those  previously  proposed  crossbars  are  nonblocking 
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Fig.  6.  Nonblocking  optical  crossbar  containing  two  parallel  net¬ 
works. 


in  the  strict  sense  in  that  any  idle  pair  of  terminals  may 
be  connected  without  disturbing  already  established 
connections.  In  this  sense  our  interconnector  is  not  a 
crossbar  because  one  may  have  to  reprogram  the  whole 
array  to  change  even  a  single  connection.  There  is  at 
least  one  possible  solution  for  the  problem  that  em¬ 
ploys  two  identical  networks  as  shown  in  Fig.  6.  The  re 
controlled  directional  couplers  on  the  left-hand  side 
are  used  to  switch  the  whole  input  pattern  between  the 
two  networks.  If,  for  example,  a  new  connection  is 
required  while  information  flow  is  in  progress  through 
network  I,  network  II  may  be  programmed  to  support 
the  complete  new  connection  pattern,  and  then  the 
inputs  may  be  switched  over  to  network  II.  In  the  next 
occasion  the  inputs  will  be  switched  back  to  network  I. 
This  will  be  a  nonblocking  crossbar  from  the  point  of 
view  of  the  optical  design  engineer  as  well  as  from  the 
point  of  view  of  the  network  engineer.  Switching  be¬ 
tween  the  two  networks  will  not  disturb  information 
flow,  since  during  the  short  transition  time  both  net¬ 
works  will  transmit  the  signals  (in  complementary 
amounts  of  power)  that  will  be  combined  by  the  con¬ 
stant  directional  couplers  of  the  proper  output  chan¬ 
nels  on  the  right-hand-side  of  the  system.  It  is  inter¬ 
esting  to  point  out  that  the  achievement  of  a  strictly 
nonblocking  system  was  at  the  expense  of  additional 
switching  elements  returning  to  the  total  of  re2. 

VI.  Discussion 

The  optical  Fredkin  gate  was  shown  to  be  an  excel¬ 
lent  Building  block  for  construction  of  a  programma¬ 
ble  optical  interconnection  array.  Such  an  array  can 
perform  all  interconnection  requirements,  such  as  the 
function  of  a  crossbar  or  perfect  shuffle.  The  overall 
performance  should  be  significantly  better  than  any 
other  approach  proposed  until  now.  Using  page  ori¬ 
ented  holographic  memories  this  will  be  the  fastest 
programmable  interconnection  network  constructed. 


and,  in  most  cases,  it  needs  only  half  of  the  activ 
elements  of  any  other  configuration. 

Being  nonblocking  with  respect  to  light  manipula 
tion  results  in  additional  benefits:  All  the  light  energ 
coupled  into  the  system  is  being  extracted  as  signal 
except  for  the  inevitable  losses  encountered  in  an; 
physical  system.  Furthermore,  if  a  defective  switci 
exists  in  the  array,  the  signal  will  in  most  cases  b 
transmitted  without  deflection.  This  characteristic 
together  with  the  implicit  redundancy  in  the  system 
may  be  utilized  for  fault  tolerant  operation.  For  ex 
ample,  assume  that  there  is  an  anticipated  fraction  e  o 
faulty  switching  elements  introduced  during  the  man 
ufacturing  process  of  an  re-channel  array.  To  avoii 
the  faulty  elements  it  is  a  simple  matter  to  make  aj 
array  that  has  re(l  +  e)  channels  (and  switching  layers 
and  then  ignore  the  faulty  layers  during  programming 
The  introduction  of  additional  channels  can  also  sup 
port  the  solution  of  problems  such  as  FAN-OUT  am 
FAN-IN.19 

Considering  the  problem  of  signal  deterioration  i 
was  shown  that  the  Fredkin  gate  network  should  per 
form  comparable  with  an  optically  blocking  network 
that  has  a  constant  SNR  similar  to  the  worst  case  SNI 
in  the  present  system. 

In  conclusion,  one  may  state  that  the  optical  Fredki: 
gate  array  may  turn  out  to  become  the  best  solution  fo 
the  implementation  of  optical  interconnections.  Re 
calling  the  fact  that  these  gates  can  also  perform  logi 
operations6  they  should  be  seriously  considered  as  th 
basic  building  blocks  for  a  future  digital  optical  com 
puter. 

This  work  was  partially  supported  by  the  Office  o 
Naval  Research  under  contract  N000014-86-K 
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Three-dimensional  optical  interconnection  gate  array 


Joseph  Shamir 


A  recently  proposed  planar  Fredkin  gate  array  for  optical  interconnections  is  extended  here  into  a  3-D  array 
that  can  be  implemented  using  ferroelectric  liquid  crystal  spatial  light  modulators.  Operating  as  polarization 
gates  these  modulators  are  efficient  and  can  be  incorporated  into  high  performance  interconnection  networks. 
Some  advantages  of  the  new  architecture  are  discussed  and  performance  characteristics  are  estimated. 


I.  Introduction 

One  of  the  most  promising  uses  of  optics  in  comput¬ 
ing  and  communications  is  the  implementation  of 
complicated  interconnections.  For  this  application, 
planar  architectures  of  Fredkin  gate  arrays  construct¬ 
ed  of  optical  waveguide  couplers  were  recently  investi¬ 
gated.  1  This  work  demonstrated  that  these  arrays  are 
efficient  with  respect  to  the  utilization  of  light  power 
and  they  are  rapidly  programmable.  In  addition  to 
their  use  in  optical  interconnection  networks  these 
arrays  can  also  be  employed  in  various  processing  op¬ 
erations  such  as  residue  arithmetics,2  logic  gate  arrays, 
and  variable  delay  lines.  The  planar  configuration  is 
attractive  for  applications  in  conjunction  with  inte¬ 
grated  optical  and  electronic  devices;  however  the  ad¬ 
vantages  of  the  2-D  parallelism  possible  with  optical 
systems  were  not  fully  exploited.  In  the  present  work 
we  explore  the  performance  of  3-D  architectures  and 
indicate  their  implementation  using  polarization 
Fredkin  gate  arrays. 

II.  Planar  Interconnection  Network 

We  start  with  a  short  review  of  the  planar  intercon¬ 
nection  array  of  n  channels  that  was  investigated  in 
Ref.  1.  One  possible  implementation  of  such  an  array 
employs  controllable  waveguide  couplers  as  represent¬ 
ed  schematically  in  Fig.  1  for  a  seven-channel  wave¬ 
guide  array.  In  this  array  there  are  seven  channels 
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with  respective  input  signals,  a,  (i  *  1,. .  .7),  seven 
outputs,  bi,  and  seven  layers  of  couplers  (switches)  that 
are  either  OFF  or  ON.  When  a  switch  is  in  the  off  state 
the  signal  in  each  channel  is  transmitted  through  the 
coupling  region  and  remains  in  its  original  channel. 
With  the  switch  in  the  ON  9tate  the  signals  are  inter¬ 
changed  between  adjacent  channels.  It  wa3  shown  in 
Ref.  1  that,  for  such  a  configuration  containing  r  chan¬ 
nels  (i »  1, . . .  n),  one  may  obtain  6,  with  all  the  possible 
permutations  of  a,  using  n  switching  layera  with  a  total 
number  of  n(n  -  l)/2  switches.  In  a  practical  situation 
where  the  switching  elements  are  not  ideal  one  may 
assign  some  average  parameter,  y,  for  the  fraction  of 
the  signal  that  leaks  through  the  coupler  into  the  un¬ 
wanted  channel  and  obtain  an  approximate  value1  for 
the  signal-to-noise  ratio  (SNR)  at  the  output: 

SNR  -  ■■■"  V  '  U> 

(n  -  l)y 

III.  Three-Dimensional  Arrays 

To  improve  the  performance  of  the  system  by  ex¬ 
ploiting  the  2-D  capabilities  of  an  optical  system  one 
may  stack  m-planar  arrays  (such  as  in  Fig.  1),  each  of  n 
channels,  into  a  3-D  architecture  [Fig.  2(a)].  The 
switching  layers  are  now  arranged  as  matrices  over 
transversal  planes  but  each  planar  array  is  indepen¬ 
dent  of  the  others.  Extending  the  earlier  analysis,  it  is 
easy  to  see  that  for  the  stack  of  n-channel  arrays  one 
needs  n-switching  layers  to  perform  all  possible  hori¬ 
zontal  interconnections.  To  make  all  vertical  connec¬ 
tions  available  too,  we  augment  the  configuration  by  n 
vertically  oriented  planar  arrays  [Fig.  2(b)]  of  m  chan¬ 
nels  each,  containing  m-switching  layers.  Thus  a  com¬ 
plete  interconnection  array  [a  cascade  of  Fig9.  2(a)  and 
( b>]  can  be  implemented  using  m  layers  of  n(n  -  l)/2 
switching  elements  and  n  layers  with  m{m  —  l)/2 
switching  elements  summing  up  to  a  total  of 

.V  »  main  —  U/2  +  nmim  -  l)/2  3  ninln  +  m  -  2)/2  (2) 

switching  elements.  With  a  square  array  of  n2  chan- 
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Fig.  1.  Seven-channel  planar  interconnection  array. 


tnpue  f*lan«  tcc'iln^  Ljycr 


Fig.  2.  (a)  Stack  of  m  planar  arrays  of  n  horizontal  channels  each, 
(b)  Stack  of  n  planar  arrays  of  m  vertical  channels  each. 


nels  (m  »  n)  one  needs  n2(n  -  1)  switching  elements 
compared  with  n 4  of  a  regular  planar  crossbar.  It 
should  be  pointed  out,  however,  that  fan-in  and  FAN¬ 
OUT  operations  with  this  simple  configuration  are  pos¬ 
sible  only  at  the  expense  of  additional  channels  as 
indicated  in  Ref.  2. 

Regarding  the  SNR,  one  may  repeat  the  calcula¬ 
tions1  that  lead  to  Eq.  (1)  or  just  observe  that,  to  first 
order,  it  is  inversely  proportional  to  the  number  of 
switching  ayers.  Thus  in  our  case  we  may  write,  in¬ 
stead  of  Eq.  (1), 


which  is  an  appreciable  improvement  compared  with 
the  planar  array  where  for  Vf^nXmin  the  present 
case)  channels  the  sum  in  the  denominator  would  have 
to  be  replaced  by  the  product  (n  x  m).  Waveguide 
arrays  as  described  in  Ref.  1  are  ideal  for  planar  net¬ 
works;  however,  for  this  3-D  architecture  different 
kinds  of  device  may  prove  more  useful. 

IV.  Polarization  Gate  Arrays 

In  Refs.  3  and  4  polarization  logic  gates  were  pro¬ 
posed  while,  independently,  in  Ref.  5,  a  similar  model 
was  proposed  for  the  implementation  of  optical  Fred- 


kin  gates.  In  these  gates  the  switching  operation  r 
tates  the  polarization  of  an  incident  beam  bv  90°  n 
employed  as  a  logic  gate  one  polarization  is  defined  as 
logic  1  while  the  orthogonal  polarization  is  defined  as  a 
logic  0.  In  this  work  we  use  the  Fredkin  gate  definf 
tion:  the  two  orthogonal  polarizations  represent  the 
two  separate  input  channels  to  the  gate  while  the 
switching  operation  interchanges  the  two  channels 
In  each  of  these  channels  the  presence  or  absence  of  a 
signal  indicates  the  logic  1  and  0,  respectively. 

Polarization  Fredkin  gates  may  be  implemented  bv 
various  electrooptic  or  magnetooptic  modulators.  For 
the  present  purpose,  the  most  promising  device  is  the 
ferroelectric  liquid  crystal  spatial  light  modulator 
(FLC)  that  already  exists  in  the  form  of  large  arrays. 
Each  pixel  of  the  FLC  can  be  addressed  separately  to 
switch  ON  -  OFF  a  halfwave  retardation,  thus  per 
forming  the  requirement  of  a  polarization  Fredkin 
gate. 

The  top  view  of  a  section  of  the  proposed  polariza¬ 
tion  interconnection  array  is  shown  schematically  in 
Fig.  3.  A  suitably  designed  Wollaston  prism  is  em¬ 
ployed  to  combine  two  channels  into  a  single  gate 
element  ( pixel) .  After  transmission  through  the  gate  a 
second,  similar  Wollaston  prism  separates  the  two  po¬ 
larizations  (channels)  and  directs  them  toward  two 
adjacent  gates  in  the  next  stage  that  is  shifted  trans- 
versally  by  half  of  the  distance  between  pixels.  The 
layout  of  each  horizontal  plane  resembles  the  planar 
waveguide  array  of  Fig.  1,  and  each  FLC  sandwiched 
between  two  Wollaston  prisms  performs  the  function 
of  a  2-D  coupling  array  as  required  in  Fig.  2(a).  Using 
FLCs  in  arrays  of  n  X  m  we  may  implement  the  com¬ 
plete  interconnection  network  with  2n  stages  (each 
pixel  in  the  FLC  represents  two  signal  channels)  per¬ 
forming  the  horizontal  interconnection  between  the  2 n 
channels  similar  to  Fig.  2(a).  To  implement  the  verti¬ 
cal  interconnections  required  in  the  architecture  of 
Fig.  2(b)  one  needs  2m  additional  stages  with  the  Wol¬ 
laston  prisms  rotated  by  90°. 

To  estimate  the  SNR  of  an  interconnection  network 
one  may  use  the  reported  switching  contrast  ratio  of 
~  100:1.  Deducing  from  this  a  signal  leakage  value  of 
~0.01  we  obtain  for  a  square  array  of  nX  n  gates  (4n  x 
n  channels)  by  Eq.  (3), 

SNR- - — - 

0.01  x  (4n  -  II 
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Thus  the  SNR  with  presently  available  gate  arrays  will 
be  better  than  2  up  to  n  =  12,  i.e.,  a  total  of  ~500 
channels  that  can  be  switched  at  a  rate  approaching  1 
MHz.  Research  on  this  kind  of  spatial  light  modulator 
indicates  that  the  above  numbers  may  be  appreciably 
improved  in  the  future. 

V.  Conclusions 

Exploiting  the  2-D  addressing  capabilities  in  optical 
systems,  it  was  shown  that  efficient  programmable 
interconnection  networks  can  be  implemented  in  a  3-D 
architecture.  Even  using  existing  liquid  crystal  spa¬ 
tial  light  modulators  that  were  not  designed  for  the 
present  purpose,  high  density  and  low-loss  networks 
are  possible.  In  addition  to  their  application  in  inter¬ 
connection  networks  these  arrays  may  become  useful 
in  other  fields,  such  as  optical  logic  gate  arrays,  arith¬ 
metic  processors,  programmable  delay  lines,  phased 
arrays,  and  wideband  signal  analyzers. 

It  is  a  pleasure  to  thank  K.  M.  Johnson  for  stimulat¬ 
ing  discussions  about  the  ferroelectric  liquid  crystal 
gate  arrays. 


This  work  was  partially  supported  by  the  Office  of 
Naval  Research  under  contract  N00014-86-K-0591. 
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Residue  arithmetic  processing  utilizing  optical 
Fredkin  gate  arrays 


Mir  M.  MirsaJehi,  Joseph  Shamir,  and  H.  John  Caulfield 


A  cascadable  residue  arithmetic  processor  based  on  optical  Fredkin  gate  arrays  and  page-oriented  holograph¬ 
ic  memories  is  introduced.  The  implementations  of  residue  functions  and  operations  by  this  processor  are 
described.  Analytic  expressions  are  derived  for  the  number  of  holograms  and  waveguide  channels  required 
for  the  implementation  of  residue  addition  and  multiplication.  The  practical  cases  of  16-bit  addition  and 
multiplication  are  analyzed  as  specific  examples.  It  is  shown  that,  using  the  proposed  architecture,  these 
operations  can  be  implemented  with  state-of-the-art  technologies  in  holography  and  integrated  optics. 


I.  Introduction 

There  is  a  growing  interest  in  the  field  of  digital 
optical  computing.1  To  obtain  digital  optical  proces¬ 
sors  that  greatly  surpass  the  performance  of  the 
present  computers,  the  inherent  advantages  of  optics 
should  be  utilized.  Two  major  advantages  of  optics 
are  interconnection  and  parallelism.  Global  intercon¬ 
nections  can  be  achieved  by  classical  optical  devices, 
such  as  prisms  and  lenses,2  or  by  holograms.3-4  Also,  it 
has  been  recently  shown  that  an  array  of  optical  Fred¬ 
kin  gates  constitutes  a  very  efficient  and  versatile  in¬ 
terconnection  network.5-6  Parallel  processing  can  be 
achieved  easily  in  optics  by  manipulating  the  elements 
of  a  2-D  array.  To  take  full  advantage  of  the  parallel¬ 
ism  in  optics,  digital  techniques  that  are  suitable  for 
parallel  processing  can  be  utilized.  One  of  these  tech¬ 
niques  is  residue  arithmetic,  which  is  based  on  the 
residue  number  system  (RNS).  The  main  advantage 
of  the  RNS  is  that  its  digits  are  independent  of  each 
other;  e.g.,  there  is  no  carry  in  addition.  This  allows 
simultaneous  operation  on  all  digits. 

The  purpose  of  this  paper  is  to  show  how  an  array  of 
optical  Fredkin  gates  can  be  used  to  realize  residue 
arithmetic.  To  provide  the  required  background,  resi¬ 
due  arithmetic  and  Fredkin  gates  are  briefly  described 
in  Sec.  II.  The  general  realization  of  residue  arithme¬ 
tic  with  optical  Fredkin  gates  is  introduced  in  Sec.  Ill, 
while  the  implementation  of  residue  addition,  multi¬ 
plication,  and  other  operations  are  described  in  Secs. 
IV,  V,  and  VI.  Finally,  m  Sec.  VII,  the  potential 
characteristics  of  this  processor  are  summarized. 
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It.  Background  on  Residue  Arithmetic  and  Fredkin 
Gates 

A.  Residue  Arithmetic 

The  foundation  of  residue  arithmetic  dates  back  to 
the  first  century  A.D.,  when  the  Chinese  mathemati¬ 
cian  Sun-Tsu  published  a  verse  in  which  he  gave  an 
algorithm  for  finding  a  number  whose  remainders  on 
division  by  3,  5,  and  7  are  known.  A  general  theory  of 
remainders  (now  known  as  the  Chinese  remainder 
theorem)  was  established  by  the  German  mathemati¬ 
cian  K.  E.  Gauss  in  the  nineteenth  century.  The  ap¬ 
plication  of  residue  arithmetic  in  computers,  however, 
is  relatively  recent  and  was  first  introduced  in  1955  by 
Svoboda  and  Valach  in  Czechoslovakia.7 

Unlike  the  commonly  used  binary  and  decimal  num¬ 
ber  systems,  the  residue  number  system  (RNS)  is  an 
unweighted  system.  The  base  of  a  residue  system 
consists  of  n  pairwise  relatively  prime  (having  no  com¬ 
mon  factor)  numbers,  mu  mz, . . . ,  mn,  called  moduli. 
Any  integer  X  can  then  be  represented  by  an  n- tuple 

(xi,*2 . x„),  where  x,  =  IXlm,,  (read  X  mod  m,j  is  the 

positive  remainder  that  is  obtained  from  the  division 
of  X  by  mi.  This  representation  is  unique  for  a  dy¬ 
namic  range  of 

i 

m  m  n 

i-i 

An  important  feature  of  the  RNS  is  that  the  fixed- 
point  arithmetic  operations  can  be  performed  on  each 
digit  individually.  That  is,  if  X  *  (xI(X2, . . .  .x*)  and  V 
■  Cyi^,  •  •  •  O'*)  are  two  numbers  of  the  same  residue 
system,  Z  ■  X  *  Y  -  (zi,Z2, . . .  ,zn),  where  z,  * 
l(x,  *  y,)|m,  fori  *  1,2, ...  ,n,  and  *  represents  addition, 
subtraction,  or  multiplication.  Division  can  be  per¬ 
formed,  but  it  is  difficult  except  for  the  remainder  zero 
case.8 

As  an  example,  consider  the  set  of  four  moduli  15,  7, 
8,9|.  These  moduli  cover  a  dynamic  range  of  2520.  In 
this  residue  system,  the  decimal  numbers  X  =  42  and  V 
-  31  are  represented  as  X  -  (2, 0,2, 6)  and  Y  =  (1,3,7, 4). 
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The  results  of  performing  addition,  subtraction,  andf 
multiplication  on  these  numbers  are  X  +  V  ■  (3,3, 1,1), 
X  -  Y  •  (1,4, 3,2)  and  X  •  Y  -  (2, 0,6,6),  which  are  the 
residue  representations  of  the  correct  answers,  i.e.,  73, 
11,  and  1302,  respectively. 

B.  Freddn  Gates 

The  basic  Fredkin  gate  has  three  binary  inputs  and 
three  binary  outputs  (Fig.  1).  The  control  input  C 
determines  the  operation  of  the  gate  according  to  the 
following  rules: 

C'-C, 


A'  -  A  and  S'  «  B,  ifC-0,  (1) 


A'  -  B  and  B'  ■  A,  if  C  ■  1. 

The  Fredkin  gate  is  a  functionally  complete  set  in 
Boolean  algebra.  That  is,  any  binary  logic  operation, 
such  as  and,  OR,  and  not,  can  be  realized  by  Fredkin 
gates.  The  application  of  optical  Fredkin  gates  as 
interconnecting  systems  is  of  special  interest.  It  has 
been  shown  recently  that  an  array  of  optical  Fredkin 
gates  can  operate  as  a  very  efficient  interconnection 
network  with  parallel  addressing  capabilities.6 

Optical  Fredkin  gates  can  be  implemented  by  vari¬ 
ous  techniques.5  Here  we  are  interested  in  the  imple¬ 
mentation  by  the  waveguide  couplers  shown  in  Fig.  2. 
The  two  inputs,  A  and  B,  are  switched  when  the  inter¬ 
action  region  /  is  activated  by  the  control  signal  C. 
The  most  efficient  construction  involves  a  photore- 
fractive  interaction  region  directly  activated  by  light. 
However,  the  electrooptic  effect  may  also  be  used  by 
employing  an  amplified  signal  from  a  photodetector 
that  receives  the  C  input. 

The  waveguide  coupler  of  Fig.  2  is  the  basic  building 
block  for  constructing  a  general  interconnection  net¬ 
work.  As  an  example,  a  four-input,  four-output  net¬ 
work  is  shown  in  Fig.  3.  Checking  all  possible  switch¬ 
ing  combinations,  one  can  show  that  with  this 
arrangement  any  input  signal  a,  (i  m  1,  2, 3,  4)  may  be 
coupled  into  any  output  port  b,-.  In  other  words,  all 
twenty-four  permutations  of  the  four  inputs  are  possi¬ 
ble  with  four  layers  of  switches  and  a  total  of  six  switch¬ 
es.  In  general,  for  n  *  2N  channels,  one  needs  n 
interaction  layers  and  n(n  -  l)/2  switches  to  establish 
all  n!  possible  permutations  of  the  inputs. 

IN.  Implementation  of  Residue  Arithmetic  by  Optical 
Fredkin  Gates 

The  calculations  in  residue  arithmetic  have  a  cyclic 
nature.  Therefore,  they  can  be  implemented  by  phys¬ 
ical  properties  that  are  also  cyclic  in  nature.  Using  the 
cyclic  property  of  the  phase  or  polarization  of  light, 
optical  residue-based  processors  have  been  devel- 
oped.9-12  A  major  problem  with  these  implementa¬ 
tions  is  that  precise  control  of  the  phase  or  polarization 
of  light  is  usually  difficult  and  requires  bulky  devices. 

A  better  technique  is  to  use  positional  coding  for 
data  representation.13-18  The  input  and  output  of  a 
residue  processor  modulo  m  can  have  integer  values 
from  zero  to  m  —  1.  Since  the  modulus  is  usually  a 


Fix-  2.  Waveguide  coupler  implementation  of  the  Fredkin  gate. 
/  is  the  interaction  region  where  coupling  is  switched  ON  or  OFF. 

LAYER  No.  I  2  3  4 


Fig.  3.  Fredkin  gate  array  of  four  channels  and  four  switching 
layers. 


N  N  +  2  N  N5 


lo)  161 

Fig.  4.  Example  implementations  of  functions  in  residue  arithme¬ 
tic  by  interconnecting  systems:  (a)  addition  of  2  to  a  residue  num¬ 
ber  modulo  4;  (b)  raising  a  residue  modulo  4  number  by  power  3. 
The  input  is  entered  from  the  left,  and  the  output  is  obtained  from 
the  right. 

small  number,  it  is  practical  to  have  m  channels  corre¬ 
sponding  to  these  values.  An  input  number  is  then 
coded  as  the  presence  of  light  in  the  channel  that 
corresponds  to  its  value.  Any  process  on  the  input 
data  is  possible  by  coupling  the  light  from  the  input 
channel  to  the  appropriate  output  channel  using  an 
interconnecting  system. 

As  illustrative  examples,  two  interconnections  that 
implement  residue  functions  modulo  four  axe  shown  in 
Fig.  4.  The  system  in  Fig.  4(a)  adds  two  to  an  input 
number  in  residue  arithmetic.  Using  modulo  four,  the 
possible  values  of  the  input  number  are  0, 1,  2,  and  3. 
With  the  above  operation,  these  values  are  mapped  to 
2, 3, 0,  and  1,  respectively.  Figure  4(b)  shows  a  system 
that  provides  the  third  power  of  a  residue  number 
modulo  4.  Other  residue  functions  can  be  realized  by 
similar  interconnections. 


IS  September  1987  /  Vot  20,  No.  18  /  APPLIED  OPTICS 


3941 


Optical  Fredkin  gates  in  conjunction  with  page-ori¬ 
ented  holographic  memories  can  be  used  to  implement 
the  interconnections  required  for  residue  arithmetic. 
Figure  5  shows  such  a  processor  that  uses  modulo  4. 
Starting  from  the  top,  the  four  channels  correspond  to 
integers  0, 1, 2,  and  3.  Depending  on  the  processes  of 
interest,  a  number  of  holograms  are  recorded  at  differ¬ 
ent  locations  of  a  holographic  material.  The  input 
number  is  coded  as  the  presence  of  light  in  one  of  the 
input  channels  on  the  left,  and  a  laser  beam  is  deflected 
to  a  particular  hologram  corresponding  to  the  required 
process.  The  reconstructed  beams  activate  some  of 
the  switching  elements  coupling  the  light  from  the 
input  channel  to  the  appropriate  output  channel. 

The  above  processor  can  be  realized  with  present 
technology.  Optical  waveguide  couplers  can  be  fabri¬ 
cated  using  integrated  optics  technology.19  Different 
holographic  materials  such  as  photographic  films, 
dichromated  gelatin,  thermoplastic  materials,  or  pho- 
torefractive  crystals  can  be  used  for  recording.20  Fi¬ 
nally,  the  deflection  of  the  laser  beam  can  be  achieved 
by  an  acoustooptic  cell.21  With  the  progress  in  the 
technology  of  spatial  light  modulators,  they  may  re¬ 
place  the  combination  of  the  acoustooptic  deflector 
and  hologram.  However,  their  operation  will  be  rela¬ 
tively  slow.  In  the  following  two  sections,  the  imple¬ 
mentations  of  residue  addition  and  multiplication 
with  this  architecture  are  analyzed  in  more  detail. 

IV.  Residue  Addition 

To  implement  a  residue  operation  on  two  numbers, 
one  of  the  numbers  Nx  is  used  as  the  input  to  the 
system,  while  the  other  number  iV2  is  used  for  selecting 
the  proper  interconnection.  To  illustrate  this  point. 
Fig.  6  shows  the  four  types  of  interconnection  (maps) 
that  are  needed  for  implementing  residue  addition 
modulo  4.  One  of  these  maps  [Fig.  6(a)]  is  a 
straightthrough  interconnection  which  can  be  ob¬ 
tained  by  default;  there  is  no  need  to  activate  any 
switches.  Each  of  the  other  three  interconnections 
can  be  realized  by  activating  some  of  the  switches. 
Therefore,  the  whole  residue  addition  modulo  4  opera¬ 
tion  can  be  implemented  with  four  channels  and  only 
three  holograms  (Fig.  7).  In  general,  the  implementa¬ 
tion  of  residue  addition  modulo  m  requires  a  Fredkin 
gate  array  of  m  channels  and  m  layers,  thus  m(m  -  l)/2 
switches,  and  recording  m  -  1  holograms. 

In  practical  cases,  a  digital  system  should  have  a 
large  dynamic  range.  This  can  be  achieved  by  choos¬ 
ing  a  set  of  pairwise  relatively  prime  moduli.  The 
optimum  set  of  moduli,  in  the  sense  of  covering  the 
required  dynamic  range  with  minimum  number  of  ho¬ 
lograms,  consists  of  numerous  small  moduli  which  are 
either  prime  or  powers  of  prime  numbers.  The  proce¬ 
dure  for  selecting  such  a  set  of  moduli  for  a  required 
dynamic  range  can  be  found  in  Ref.  22. 

As  an  example,  a  16-bit  fixed-point  operation  re¬ 
quires  a  dynamic  range  of  218  ■  65,536.  The  optimum 
set  of  moduli  for  this  case  is  |3,  5,  7,  8,  11,  13),  which 
covers  a  dynamic  range  of  120,120.  Each  modulus  m, 
is  treated  individually  by  devoting  m,  channels  to  it. 


\ 


POHM 


OFG  A 


Fig.  5.  Schematic  diagram  of  the  proposed  processor  POHM. 
page-oriented  holographic  memory;  OFGA,  optical  Fredkin  gate 
array. 


(ci  id) 

Fig.  6.  Interconnections  corresponding  to  residue  addition  (.V,  + 
IV2)  modulo  4.  The  interconnections  (a),  (b),  (c).and  (d)  correspond 
to  »  0. 1, 2,  and  3.  respectively.  The  input  .V,  is  entered  from  the 
left,  and  the  output  iV ,  +  ,V2  is  obtained  from  the  right. 
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Fig.  7.  Required  switching  states  for  implementing  residue  addi¬ 
tion  (.V,  +  S2)  modulo  4.  The  hatched  switching  elements  are  on 
The  four  interconnections  realized  in  taM  b).  (c).  and  id)  correspond 
to  iV2  »  0.  1,  2.  and  3.  respectively. 
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N,  N,»2  N,XNj 
0  - 7 —  0 


2  — <  y —  2 

3  - f  —  3 

(e) 

Fig.  8.  Interconnections  corresponding  to  residue  multiplication 
(N 1  x  N?)  modulo  4.  The  interconnections  (a),  (b),  (c),  and  (d) 
correspond  to  IVj  »  0,  l,  2,  and  3,  respectively.  The  input  IV,  is 
entered  from  the  left,  and  the  output  iV,  x  .V2  is  obtained  from  the 

right. 

Considering  residue  addition,  the  number  of  required 
holograms  corresponding  to  moduli  3, 5, 7, 8, 1 1,  and  13 
are  2, 4, 6,  7, 10,  and  12,  respectively.  The  number  of 
switches  corresponding  to  the  above  moduli  are  3, 10, 
21, 28, 55,  and  78,  respectively.  Therefore,  the  16-bit 
fixed-point  addition  can  be  implemented  in  residue 
arithmetic  by  a  page-oriented  holographic  memory 
consisting  of  forty-one  patterns  and  a  waveguide  gate 
array  consisting  of  forty-seven  channels  and  195 
switches. 

V.  Residue  Multiplication 

The  implementation  of  residue  multiplication  by 
Fredkin  » ates  is  not  as  easy  as  the  residue  addition 
case.  This  is  due  to  the  difference  that  exists  between 
the  types  of  interconnection  needed  for  these  opera¬ 
tions.  Residue  addition  has  the  property  that  each 
possible  >  alue  has  the  same  number  of  occurrences  in 
the  output.  Also,  the  mappings  corresponding  to  resi¬ 
due  addi‘  ion  are  one-to-one  (onto).  These  properties 
are  not  vtlid  for  residue  multiplication.23  For  exam¬ 
ple,  the  four  interconnections  corresponding  to  residue 
multiplic  i tion  modulo  four  are  shown  in  Fig.  8.  It  can 
be  seen  tiiat  the  occurrences  of  the  output  values  are 
not  the  same  and  that  two  of  the  mappings  {(a)  and  (c)] 
are  not  c  te-to-one.  Using  Fredkin  gate  arrays,  any 
permutation  of  the  input  signals  can  be  achieved. 
However,  no  two  input  signals  can  be  coupled  into  the 
same  output  port.  Therefore,  Fredkin  gate  arrays  are 
naturally  suitable  for  onto  mappings,  and  some  modi¬ 
fications  are  required  to  implement  a  general  case  as 
described  in  the  following  subsections. 

A.  Increasing  the  Number  of  Holograms 

One  method  for  implementing  residue  multiplica¬ 
tion  with  optical  Fredkin  gates  is  to  increase  the  num¬ 


ber  of  holograms.  The  selection  of  the  appropriate 
hologram  for  a  specific  case  then  depends  on  both 
input  numbers.  As  an  example,  we  discuss  residue 
multiplication  modulo  4.  The  realization  of  the  inter¬ 
connection  for  the  Nz  *  0  case  [Fig.  8(a)]  requires  the 
recording  of  three  holograms  corresponding  to  Ni  *  1, 
2,  and  3.  The  case  of  Ni  ■  N2  ■  0  does  not  need  a 
hologram,  since  zero-to-zero  coupling  does  not  require 
any  switches  to  be  activated.  Similarly,  the  N2  *  1 
case  [Fig.  8(b)]  does  not  require  any  holograms,  since  it 
corresponds  to  a  straightthrough  interconnection. 
The  N2  ■  2  case  [Fig.  8(c)]  requires  the  recording  of  two 
holograms,  one  for  N,  »  0  and  1,  the  other  one  for  Ni  * 
2  and  3.  Finally,  the  N2  *  3  case  [Fig.  8(d)]  requires 
one  hologram,  since  it  corresponds  to  an  onto  mapping. 
Therefore,  the  whole  operation  of  residue  multiplica¬ 
tion  modulo  4  can  be  implemented  by3  +  2  +  l*6 
holograms. 

In  general,  the  number  of  required  holograms  for 
multiplication  mod  m  m  pn,  where  p  is  a  prime  number 
and  n  is  a  positive  integer,  can  be  obtained  from 

N*  -  (n  +  Dp"  -  ftp"-1  -  2.  (2) 

The  derivation  of  the  above  formula  is  provided  in 
Appendix  A.  For  the  special  case  of  n  *  1,  Eq.  (2)  is 
reduced  to  Aft,  *  2p  —  3. 

As  an  illustrative  example,  the  number  of  required 
holograms  for  implementing  residue  multiplication 
moduli  3, 5, 7, 8  (*  23),  11,  and  13  are  3, 7, 11, 18, 19,  and 
23,  respectively.  The  16-bit  fixed-point  multiplica¬ 
tion  that  uses  the  above  moduli  can,  therefore,  be 
implemented  by  eighty-one  holograms.  This  is  about 
twice  the  corresponding  number  for  16-bit  addition. 
The  number  of  required  channels  and  switches  are  the 
same  as  those  for  the  addition  case,  i.e.,  forty-seven 
channels  and  195  switches. 

This  method  may  be  useful  for  some  applications, 
but  the  problem  is  that  the  deflection  of  the  laser  beam 
to  the  appropriate  hologram  depends  on  both  input 
numbers.  This  is  sometimes  practically  difficult  to 
achieve  and  requires  a  partial  electronic  processing. 
Also,  since  one  of  the  numbers  should  be  presented  in 
two  forms  (as  the  input  to  a  waveguide  and  as  the  input 
to  the  beam  deflector)  the  system  is  not  cascadable. 
The  method  described  in  the  next  subsection  over¬ 
comes  these  shortcomings. 

B.  Increasing  the  Number  of  Channels 

Another  method  for  implementing  residue  multipli¬ 
cation  is  to  increase  the  number  of  channels.  In  this 
method,  the  number  of  channels  that  are  devoted  to 
each  value  is  determined  by  the  maximum  degeneracy 
of  that  value  in  the  output.  We  demonstrate  the  pro¬ 
cedure  again  by  the  residue  multiplication  modulo  4 
case.  As  shown  in  Fig.  8,  the  maximum  degeneracies 
of  the  values  0, 1, 2,  and  3  in  the  output  are  4, 1, 2,  and  1, 
respectively.  Therefore,  a  total  of  44-1+2  +  1*8 
channels  is  needed  to  implement  this  operation  (Fig. 
9).  The  extra  channels  are  used  to  make  many-to-one 
mappings  possible.  In  the  input,  only  one  channel  is 
needed  to  code  each  value.  The  input  values  0,  1,  2, 
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and  3  are  coded  as  the  presence  of  light  in  the  first, 
fifth,  sixth,  and  eighth  channel,  leaving  the  other  input 
channels  idle.  In  the  output,  the  presence  of  light  in 
one  of  the  first  four  channels  is  an  indication  of  the 
result  being  equal  to  zero.  If  the  result  is  1,  light 
should  appear  in  the  fifth  channel;  if  it  is  2,  light  should 
appear  in  either  the  sixth  or  seventh  channel;  and  if  it 
is  3,  light  should  appear  in  the  eighth  channel. 

Figure  9  shows  how  the  required  interconnections 
for  residue  multiplication  modulo  4  can  be  obtained. 
The  four  cases  shown  in  this  figure  correspond  to  iV2  * 
0, 1,  2,  and  3,  respectively.  In  each  case,  the  switches 
that  should  be  activated  are  marked.  Notice  that  the 
iV2  =  1  case  does  not  require  activating  any  switches. 
Therefore,  the  whole  process  can  be  implemented  by 
only  three  holograms.  In  general,  using  this  tech¬ 
nique,  multiplication  modulo  m  requires  m  -  1  holo¬ 
grams  (same  as  the  number  required  for  residue  addi¬ 
tion). 

The  fact  that  more  than  one  channel  is  devoted  to 
some  output  values  does  not  produce  a  problem  in 
cascading  these  processors.  One  method  for  cascading 
is  to  merge  all  the  output  channels  that  correspond  to  a 
particular  value  by  using  a  transition  region.  Also, 
notice  that  the  above  architecture  used  for  implement¬ 
ing  non-onto  mappings  can  also  be  used  to  handle  onto 
mappings.  F or  example,  the  same  waveguide  couplers 
that  are  used  for  realizing  residue  multiplication  can 
be  used  to  realize  residue  addition  as  well.  In  this  case, 
some  of  the  channels  will  not  be  used,  since  for  imple¬ 
menting  an  onto  mapping,  only  one  channel  is  needed 
for  each  value. 

The  number  of  channels  required  in  this  technique 
depends  on  the  modulus.  In  general  (see  Appendix 
B),  the  number  of  required  channels  Nc  for  multiplica¬ 
tion  mod  m  »  pn,  where  p  is  a  prime  number  and  n  is  a 
positive  integer,  is  given  by 

Nr  -  (n  +  l)p"  -  npn~l.  (3) 

For  the  special  case,  where  n  *  1,  Eq.  (3)  is  reduced  to 
Ne  -  2p  -  1. 

It  is  interesting  to  note  that  the  number  of  channels 
in  this  method  is  very  close  to  the  number  of  holograms 
in  method  A.  In  fact,  the  two  numbers  differ  by  a 
constant  of  2.  This  difference  is  due  to  the  two  inter¬ 
connections  (corresponding  to  the  Ni  m  N2  *  0  and  N2 
3  1  cases)  that  are  realized  by  default  in  the  first 
method.  If  two  holograms  are  considered  for  these 
cases,  the  two  numbers  become  identical. 

Another  interesting  point  is  that,  although  the  num¬ 
ber  of  input  and  output  channels  in  Fig.  9  is  eight,  only 
five  interaction  layers  and  seventeen  switches  are  used, 
because  not  all  permutations  of  the  input  channels  are 
needed.  For  example,  if  N t  »  3,  the  input  light  does 
not  have  to  be  coupled  to  the  first  three  channels  of  the 
output.  In  general,  the  number  of  required  interac¬ 
tion  layers  Nt  for  residue  multiplication  is 

.V,  »  Ne  -  m  +  1.  (4) 

where  Nc  is  the  number  of  channels  and  m  is  the 
modulus.  Having  the  number  of  layers,  the  number  of 


*1  *|  X  N2  *(|  *^.1  N,  X  Wj 


(Cl  (It) 

Fig.  9.  Required  switching  states  for  implementing  residue  multi¬ 
plication  ( jV,  x  JV2)  modulo  4.  The  hatched  switching  elements  are 
ON.  The  four  interconnections  realized  in  la),  lb).  Ic).  and  id) 
correspond  to  Ni  »  0,  1.  2.  and  3,  respectively. 

switches  N,  can  then  be  found  from  the  corresponding 
expression  for  an  array  of  Nc  channels  and  Ni  layers, 
i.e., 

(V,  -  [<iVc  -  UN,/2\.  (5) 

If  Nc  is  even  and  Ni  is  odd,  depending  on  the  structure 
of  the  gate  array,  N,  is  the  nearest  upper  or  lower 
integer  of  the  value  inside  the  brackets.  It  is  possible 
to  design  the  array  so  that  N is  the  nearest  lower 
integer. 

As  an  illustrative  example,  the  number  of  required 
channels  for  implementing  residue  multiplication 
moduli  3, 5, 7, 8  (=■  23),  11,  and  13  are  5, 9, 13, 20, 21,  and 
25,  respectively.  The  16-bit  multiplication  that  uses 
the  above  moduli  can,  therefore,  be  implemented  by 
ninety-three  channels.  The  number  of  interaction 
layers  required  for  the  above  moduli  are  3, 5,  7, 13, 11, 
and  13,  respectively.  The  corresponding  number  of 
switches  are  6,  20,  42, 123,  110,  and  156,  which  add  up 
to  457.  The  number  of  required  holograms  is  the  same 
as  the  16-bit  addition  case,  i.e.,  forty-one. 

VI.  Other  Applications 

A  major  advantage  of  the  second  architecture  is  that 
it  is  cascadable.  The  output  of  the  processor  appears 
as  the  presence  of  light  in  a  particular  position,  where 
an  input  channel  of  the  next  processor  may  exist.  One 
possible  application  of  the  cascading  property  is  the 
evaluation  of  polynomials.  Horner’9  rule  for  polyno¬ 
mial  evaluation  is  well  known.  For  example, 

Plx)  “  a,x*  +  a^x3  +  a^x:  +  a,x  +  a0 

»  l[(a,x  +  a3)x  +  o:Jx  4-  a,|x  +  a„.  (6) 

This  can  be  easily  pipelined  into  a  9et  of  operations  on 
an  optical  input  signal  using  the  values  of  x  and  a,  as 
the  inputs  to  the  deflectors  (Fig.  10).  Since  positional 
coding  has  been  used  for  data  representation,  minor 
light  losses  do  not  prevent  such  cascading.  Polynomi- 
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al  evaluation  is  a  very  powerful  operation  because 
many  functions  can  be  represented  accurately  by  a 
polynomial. 

The  proposed  architecture  is  not  limited  to  perform¬ 
ing  a  series  of  arithmetic  operations.  In  fact,  any 
mapping  in  the  residue  system  can  be  performed  by 
this  processor.  To  allow  for  all  possible  mappings,  m 
channels  should  be  devoted  to  each  possible  value, 
where  m  is  the  modulus  used.  Thus  an  cs ray  of  m2 
channels  is  required  for  the  most  general  operation. 
The  input  number  Ni  can  be  coded  as  the  presence  of 
light  in  one  of  the  channels  that  correspond  to  its  value. 
Depending  on  the  mapping  of  interest,  a  particular 
hologram  is  selected  by  the  second  input  number  N 2. 
The  reconstructed  light  activates  some  of  the  switches 
and  couples  the  input  light  to  one  of  the  m  output 
channels  that  correspond  to  the  result. 

Vli.  Conclusions 

A  residue  arithmetic  processor  based  on  optical 
Fredkin  gate  arrays  has  been  introduced.  The  proces¬ 
sor  consists  of  optical  waveguide  couplers  and  a  page- 
oriented  holographic  memory.  The  components 
needed  for  the  fabrication  of  this  device  can  be 
achieved  with  the  present  technologies  in  integrated 
optics  and  holography.  The  device  is  insensitive  to 
variation  in  phase  or  polarization  of  light,  since  posi¬ 
tional  coding  is  used  for  data  representation  and  pro¬ 
cessing.  And  Anally,  the  processor  is  cascadable. 

Realization  of  residue  functions  and  operations  with 
this  processor  has  been  described.  The  implementa¬ 
tions  of  residue  addition  and  multiplication  have  been 
analyzed  in  detail.  The  implementation  of  residue 
addition  is  straightforward,  since  all  the  mappings  are 
onto.  Residue  multiplication  is  more  complex,  since 
some  of  the  required  mappings  are  not  onto. 

Two  methods  have  been  described  to  realize  non- 
onto  mappings  with  optical  Fredkin  gate  arrays.  One 
method  is  to  increase  the  number  of  holograms  without 
changing  the  number  of  channels.  The  second  meth¬ 
od,  which  appears  to  be  more  powerful,  is  to  increase 
the  number  of  channels  without  changing  the  number 
of  holograms.  The  latter  technique  has  the  advantage 
that  the  addressing  of  the  holographic  memory  is  de¬ 
termined  by  one  of  the  input  numbers  and,  therefore, 
can  be  achieved  by  a  1-D  deflector,  such  as  an  acous¬ 
tooptic  cell. 

The  proposed  processor  is  not  restricted  to  the  basic 
arithmetic  operations.  It  has  been  shown  that  more 
complex  operations,  such  as  polynomial  evaluation 
and  general  mapping,  can  be  implemented  with  this 
architecture. 

Appends  A:  Number  of  Holograms  In  Method  A 

In  this  Appendix,  analytic  expressions  are  derived 
for  the  number  of  holograms  required  for  implement¬ 
ing  residue  multiplication  using  the  method  described 
in  Sec.  V.A.  The  two  numbers  involved  in  multiplica¬ 
tion  are  N\  and  N2,  where  N 1  is  coded  as  the  presence  of 
light  in  a  channel  waveguide,  while  both  Nx  and  N2  are 
used  as  the  input  to  the  deflecting  system.  The  re- 


Fig.  10.  Cascaded  system  for  evaluating  P(z)  »  a*x4  +  ajx3  +  a2X2  + 
a,x  -f  ao  using  Horner's  rule. 

quired  mappings  for  an  example  case  can  be  seen  in 
Fig.  8. 

If  the  modulus  is  a  prime  number  (i.e.,  m  *  p),  all  the 
mappings,  with  the  exception  of  the  one  that  corre¬ 
sponds  to  the  A^2  *  0  case,  are  onto.  This  special  case 
requires  p  —  1  holograms,  one  for  each  of  the  nonzero 
values  of  Ni.  No  hologram  is  needed  for  Ni  *  N2  ■  0, 
since  zero-to-zero  coupling  does  not  require  any 
switches  to  be  activated.  The  remaining  values  of  N2 
(i.e.,  1, 2, ... ,  and  p  -  1)  produce  onto  mappings.  The 
Nz  *  1  case  does  not  require  any  holograms,  since  it 
corresponds  to  a  straightthrough  interconnection. 
Each  of  the  other  cases  requires  one  hologram.  There¬ 
fore,  the  total  number  of  holograms  is  IV*,  *  (p  -  1)  + 
(p  -  2)  -  2p  -  3. 

If  the  modulus  is  not  prime,  more  holograms  are 
needed.  The  case  of  interest  is  when  the  modulus  is  a 
power  of  a  prime  number,  i.e.,  m  =  pn.  The  number  of 
required  holograms  for  this  general  case  can  be  ob¬ 
tained  by  analyzing  the  mappings  involved  as  follows: 

(1)  The  Ni  ■  0  case  maps  all  the  inputs  to  the  zero 
output.  Hence  it  requires  m  -  1  *  pn  -  1  holograms. 

(2)  The  Ni  *  hp  cases,  where  0<k<  pn_1  and  k  and 
p  are  relatively  prime,  map  the  inputs  to  the  output 
ports  that  correspond  to  integer  multiples  of  p.  There 
are  (p  —  Dp"-2  such  cases,  and  each  requires  p  holo¬ 
grams.  Therefore,  (p  -  Dp"-1  holograms  are  needed 
for  these  cases. 

(3)  In  general,  the  N2  »  kpq  cases,  where  0  <  q  <  n, 
0  <  k  <  pn~q,  and  k  and  p  are  relatively  prime,  map  the 
inputs  to  the  output  ports  that  correspond  to  integer 
multiples  of  pq.  There  are  (p  -  l)pn~q~x  such  cases, 
and  each  requires  pv  holograms.  Therefore,  to  realize 
the  cases  corresponding  to  each  value  of  q,  the  storage 
of  (p  -  1  )pn~  holograms  is  needed.  Since  q  has  n  -  1 
possible  values,  the  total  number  of  holograms  corre¬ 
sponding  to  all  N2  ■  kpq  cases  is  (n  -  l)(p  -  l)pn_l. 
This  includes  the  number  of  holograms  obtained  in  (2). 

(4)  In  all  the  cases  considered  so  far,  N2  is  an  integer 
multiple  of  p.  The  total  number  of  these  cases  is  pn~ 1 . 
In  the  remaining  pn  -  pn_1  cases,  N2  and  p  are  relative¬ 
ly  prime  and  produce  onto  mappings.  The  intercon¬ 
nection  for  one  of  these  cases  (N2  »  1)  can  be  realized 
without  any  hologram,  while  the  others  need  one  holo¬ 
gram  each.  Therefore,  pn  -  pn~l  —  1  holograms  are 
needed  to  realize  the  interconnections  corresponding 
to  these  cases. 

The  total  number  of  holograms  for  implementing 
residue  multiplication  modulo  m  =»  p"  using  method  A 
can  then  be  obtained  by  adding  the  numbers  derived  in 
(1),  (3),  and  (4).  The  result  is 

•V„  -  ip"  -  1)  +  (a  -  l Hp  -  lip"'1  +  (p"  -  p”-1  -  1) 

»  in  +  l)pn  -  npn~'  -  2.  (Al) 
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Appendix  B:  Number  of  Chaimeto  In  Method  B 

In  this  Appendix,  analytic  expressions  are  derived 
for  the  number  of  channels  required  for  implementing 
residue  multiplication  described  in  Sec.  V.B.  The  two 
numbers  involved  in  multiplication  are  Ni  and  2V2, 
where  Ni  is  coded  as  the  presence  of  light  in  a  channel 
waveguide,  and  1V2  is  used  as  the  input  to  the  deflecting 
system.  The  required  mappings  for  an  example  case 
can  be  seen  in  Fig.  8. 

If  the  modulus  is  prime,  except  the  mapping  that 
corresponds  to  the  iV2  *  0  case,  the  other  mappings  are 
onto.  For  this  special  case,  all  values  of  Ni  should  be 
mapped  to  the  zero  output;  hence  p  channels  are  re¬ 
quired  for  the  zero  value.  In  all  other  cases,  each 
output  value  has  a  degeneracy  of  one.  Hence  each  of 
the  values  1,  2,  . . . ,  p  -  1  requires  one  channel. 
Therefore,  the  total  number  of  required  channels  is  Nc 
»  p  +  (p  “  1)  »  2p  —  1. 

If  the  modulus  is  not  prime,  more  channels  are  need¬ 
ed.  The  case  of  interest  is  when  the  modulus  is  a 
power  of  a  prime  number,  i.e.,  m  *  pn.  The  number  of 
required  channels  for  this  general  case  can  be  obtained 
by  analyzing  the  maximum  degeneracy  of  each  output 
value  as  follows: 

(1)  The  maximum  degeneracy  of  zero  in  the  output 
is  pn,  which  corresponds  to  the  1V2  =*  0  case.  Therefore, 
pn  channels  are  needed  for  the  zero  value. 

(2)  Each  value  expressible  as  kp,  where  0  <  k  <  pn~l 
and  k  and  p  are  relatively  prime,  has  a  maximum 
degeneracy  of  p.  There  are  (p  -  l)pn-2  such  values. 
Therefore,  a  total  of  (p  -  l)prt_1  channels  is  needed  for 
the  above  values. 

(3)  In  general,  each  of  the  values  expressible  as  kpq, 
where  0  <q  <n,0  <k  <  pn~r>,  and  k  and  p  are  relatively 
prime,  has  a  maximum  degeneracy  of  pq.  There  are 
(p  -  Up"-'1-1  such  values.  Therefore,  (p  -  Dp"-1  * 
pn  _  pn-i  channels  are  needed  for  each  value  of  q. 
Since  q  has  n  -  1  possible  values,  a  total  of  ( n  -  1)  (pn  - 
pn-1)  channels  is  needed  for  all  values  expressible  as 
kpq.  This  includes  the  number  of  channels  obtained 
in  (2). 

(4)  All  the  values  considered  so  far  correspond  to 
integer  multiples  of  p.  The  total  number  of  these 
cases  is  pn~l.  Each  of  the  remaining  pn  —  pn~l  values 
has  a  maximum  degeneracy  of  one.  Therefore,  pn  - 
pn~ 1  channels  are  needed  for  these  values. 

The  total  number  of  required  channels  for  imple¬ 
menting  residue  multiplication  modulo  m  »  pn  using 
method  B  can  then  be  obtained  by  adding  the  numbers 
derived  in  (1),  (3),  and  (4).  The  result  is 

iV,  -  p"  4-  in  -  l)(p”  -  p +  (pn  -  p"~') 

■  (n  +  Up"  -  npn_l.  (Bl) 
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Pattern  recognition  using  reduced  information 
content  filters 


Joseph  Shamir,  H.  John  Caulfield,  and  Joseph  Rosen 


Pattern  recognition  by  optical  spatial  filtering  procedures  is  discussed  using  general  considerations  with  the 
objective  of  reducing  the  information  content  in  the  spatial  filter.  The  achievement  of  this  objective  is  very 
useful  toward  the  wide  application  of  spatial  light  modulators  and  also  for  facilitating  distortion  invariant 
recognition.  The  proposed  novel  approach  is  demonstrated  by  an  example  employing  bipolar  spatial  filters 
for  rotation  invariant  pattern  recognition. 


I  a — a a.  -»■ 

■urooucuoii 

Usually  the  emphasis  in  research  toward  a  useful 
optical  pattern  recognition  architecture  is  the  attain¬ 
ment  of  higher  and  narrower  correlation  peaks  em¬ 
ploying  holographic  spatial  filters1-2  with  high  infor¬ 
mation  content.  For  real-time  applications  one  would 
like  to  use  devices  like  spatial  Ught  modulators  that 
cannot  handle  these  large  amounts  of  information. 
The  high  information  content  is  also  a  hindrance  when 
distortion  invariance  such  as  rotation  or  scale  change  is 
considered.  For  example,  both  the  matched  filter2 
and  its  more  recent  variant,  the  phase-only  matched 
filter,3-4  yield  high  correlation  peaks.  Unfortunately, 
these  filters  are  the  most  intolerant  of  any  distortion 
because  a  large  part  of  their  information  content  is  that 
of  the  orientation  and  scale  of  the  object. 

The  main  objective  of  this  work  is  development  of  a 
pattern  recognition  approach  taking  into  consider¬ 
ation  the  resolution  limitations  of  presently  available 
spatial  light  modulators.  To  achieve  this  goal  we  seek 
a  procedure  for  reducing  to  a  minimum  the  amount  of 
information  to  be  written  on  these  modulators  when 
they  are  employed  in  the  input  and  filter  planes  of  a 
pattern  recognition  system.  It  is  evident  that  the 
penalty  to  be  paid  is  a  reduction  in  the  quality  of  the 
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correlation  peaks,  but  this  will  be  a  suitable  price  for 
higher  flexibility  and  easier  applicability. 

We  start  from  general  considerations  that  are  inde¬ 
pendent  of  the  particular  architecture  to  be  adopted. 
Most  of  the  steps  described  may  be  applied  to  a  diverse 
set  of  configurations.  For  example,  they  are  valid  for 
coherent  or  incoherent  pattern  recognition  performed 
by  employing  spatial  frequency  filtering  or  template 
matching.  To  obtain  shift  invariance  we  shall  restrict 
the  discussion  to  spatial  filtering  procedures  over  the 
Fourier  transform  plane. 

II.  General  Considerations 

We  define  our  goal  to  be  the  recognition  of  each 
pattern  in  a  set  of  N  patterns,  ft(x,y),  (t  -  1,2,. . .  „V). 
The  limitation  to  N  predetermined  patterns  is  not  so 
severe  as  it  seems  at  first  sight,  since  one  or  more  of 
these  patterns  may  be  noise  or  background.  We  form 
2-D  Fourier  transforms  (FTs),  F,(u,v),  and  wish  to 
manufacture  a  set  of  filters  Af/u.o),  (j  *  1,2, .. .  hV)  in 
such  a  manner  that  we  obtain  an  optimal  response 
represented  schematically  by  the  relation 

R,,  *0(F,(u.«);M,<u.ul|«i1,,  (0 

where  O  is  some  operator.  The  degree  to  which  we  can 
approach  this  ideal  response  depends  on  the  operator, 
the  set  of  filters,  and  the  patterns  involved.  For  exam¬ 
ple,  we  may  consider  the  integral  power  reaching  the 
output  plane  of  the  optical  system,  0(x,y),  indicated  in 
the  schematic  representation  of  Fig.  1.  By  Parseval’s 
theorem  this  power  is  identical  with  the  power  trans¬ 
mitted  by  the  filter  positioned  at  the  FT  plane  [Af(u,o) 
in  the  figure].  For  this  configuration  criterion  (1)  has 
the  form 

R„  ”  jlF1(u.u)Af;(u,y)|J(fudu  •  i,r  (2) 

This,  however,  is  a  paradoxical  requirement  since  we 
deal  with  a  positive  definite  integrand,  and  one  may 


15  June  1987  /  Vol.  26,  No.  12  /  APPLIED  OPTICS 


2311 


M  I  u , V )  UIK.VJ 

f  ( x  ,  y  I 


Fig.  1.  Spatial  filtering  system:  L,  Fourier  transforming  lenses  of 
focal  length  /;  f(x,y),  input  pattern;  0(x,y),  output  pattern;  and 
filter  function. 


have  a  nonvanishing  filter  function  only  for  i  =  j. 
Naturally,  such  a  criterion  cannot  lead  to  a  selective  set 
of  filters,  and  one  should  seek  a  solution  that  involves 
the  analysis  of  a  power  redistribution  over  the  output 
plane. 

As  our  starting  point  we  refer  to  Fig.  1  and  define  the 
response  according  to  Eq.  (1)  as  the  power  incident  at 
the  origin  of  the  output  plane.  (Since  we  are  dealing 
with  Fourier  plane  filtering  the  position  of  this  origin 
corresponds  to  the  position  of  the  object  in  the  input 
plane.)  Denoting  by  0,y(x,y)  the  output  distribution 
produced  by  pattern  F,(u,o)  illuminating  filter  Mj(u,v) 
this  recognition  criterion  states 


This  is  a  very  far  reaching  consequence  as  it  means  that 
to  discriminate  among  N  patterns  it  is  adequate  to  use 
filters  with  N  transmittance  values.  We  have  to  point 
out,  however,  that  the  above  conclusion  is  only  theoret¬ 
ical  and  holds  if  filters  and  detection  can  be  imple¬ 
mented  with  infinite  dynamic  range  and  infinite  accu¬ 
racy.  Furthermore,  the  above  relations  were  obtained 
by  constraints  imposed  on  a  single  point  in  the  output 
plane.  For  a  satisfactory  discrimination,  taking  into 
account  practical  considerations,  this  will  usually  not 
be  adequate,  and  the  number  of  equations  (and  sam¬ 
ples)  will  have  to  be  multiplied  by  the  number  of 
required  discriminating  points.  This  procedure  es¬ 
sentially  generates  a  synthetic  discriminant  function 
(SDF).5 

We  considered  up  to  this  point  N  X  L  rectangular 
sample  regions  just  as  an  example.  To  attain  efficient 
recognition  the  area  and  shape  of  these  samples  must 
be  optimized  according  to  the  recognition  task.  For 
another  example  we  consider  rotation  invariant  pat¬ 
tern  recognition  with  rotationally  invariant  filters. 
For  this  case  the  filter  division  is  along  concentric 
rings.  Denoting  the  radius  of  the  fcth  ring  by  rk  we  may 
have  to  look  for  an  optimal  function  h(k)  that  gives  the 
various  radia 


=  |0.;(0.0)|2  =  a,,, 


(3) 


where,  in  the  configuration  of  Fig.  1, 

Ojj(x,y)  *  ?[F1(u,t))JVfJ(u.y)],  (4) 

and  Eq.  (1)  is  now  equivalent  to 

|  fF,(u1v)AfJ(u,v)dudi^  =  btj.  (5) 


This  relation  represents  a  set  of  linear  equations 
that  can  be  solved,  at  least  in  principle,  to  generate  the 
filters  Mj(u,u). 


III.  Filter  Generation 

To  solve  Eq.  (5)  for  each  filter  and  generate  M,  we 
have  to  sample  the  Fourier  plane.  Assuming  a  rectan¬ 
gular  coordinate  system  we  divide  the  Fourier  plane 
into  K  X  L  regions  of  area  ski,  each  (not  necessarily 
equal)  with  k  -  1,2, . . .  J<  and  l  -  1,2, . . .  JL.  To  each 
of  these  regions  we  designate  a  constant  value  M]ki  as 
its  (generally  complex)  amplitude  transmittance. 

Integrating  the  incident  complex  amplitude  over 
each  region  we  form  the  matrix  elements 

Flki  =  f  F,(u,v)dudi\  (6) 

'*kl 


and  we  may  generate  the  filter  samples  by  solving  the 
set  of  N 2  linear  equations: 


K  L 

V  £  FlklMlkl 


*-i  i-i 


(7) 


where  i,j  =  1,2,. .  .  ^V. 

Equation  (7)  gives  N  equations  for  each  of  the  N 
filters  M,{u, u)  consisting  of  K  X  L  unknown  samples. 
Thus  one  may  obtain  a  unique  solution  if  K  X  L  -  N. 


rk  =  h(k).  (8) 

An  interesting  and  simple  class  of  these  functions 
can  be  written  in  the  form 

*</?)  =  r,te«,  (9) 

where  r,  and  q  are  constants.  The  special  case  of  q  =  V2 
is  the  Fresnel  zone  division  where  all  the  rings  have  the 
same  area,  while  the  case  q  -  may  be  termed  the 
inverse  Fresnel  zone  plate  (i.e.,  the  kth  radius  of  the 
Fresnel  zone  plate  multiplied  by  the  fcth  radius  of  the 
inverse  Fresnel  zone  plate  is  a  constant  for  all  k). 
These  two  kinds  of  division  complement  each  other 
with  respect  to  the  nature  of  patterns  to  be  discrimi¬ 
nated.  The  first  kind  of  division  has  rings  that  become 
very  narrow  for  high  spatial  frequency  values,  thus 
making  it  a  good  rotation  invariant  filter  for  patterns 
having  their  important  features  at  high  frequencies. 
Conversely,  the  second  choice  will  be  suitable  for  filter¬ 
ing  information  at  low  spatial  frequencies.  An  inter¬ 
mediate  case  may  be  treated  with  filters  having  q  =  1 
where  the  width  of  the  rings  is  constant.  This  analysis 
is  reminiscent  of  the  procedures  utilized  in  Ref.  6 
where  a  specific  circular  harmonic  was  chosen  for  each 
recognition  task  depending  on  the  objects  to  be  dealt 
with.  Sometimes  the  useful  information  is  concen¬ 
trated  only  in  certain  regions  of  the  filter  plane.  For 
example,  in  many  cases  the  low  frequency  region  does 
not  contain  selective  information,  and  better  filtering 
is  obtained  by  eliminating  the  energy  in  this  region 
altogether. 

A  similar  procedure  would  be  implemented  for  com¬ 
plete  scale  invariant  pattern  recognition  where  the 
filter  should  depend  on  angular  orientation  only  and 
not  on  the  distance  from  the  origin.  For  this  case  one 
would  need  radial  division  lines  to  split  the  filter  plane 
into  L  sectors. 
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Fig.  2.  Two  random  patterns  to  be  discriminated. 


Fig.  Rough  representation  of  a  ^nationally  invariant  bipolar 
filter  made  for  recognizing  the  top  pattern  of  Fig.  2. 


IV.  Bipolar  Filters  and  Experiment 

In  principle  the  filters  described  in  this  work  can  be 
generated  similarly  to  other  composite  filters '  or  circu¬ 
lar  harmonic  filters6  as  computer  generated  holo¬ 
grams.  However,  the  present  procedure  has  a  more 
general  attitude,  and  other  implementations  are  also 
possible.  Although  the  information  content  of  these 
filters  is  relatively  low,  a  holographic  filter  needs  still  a 
quite  large  bandwidth.  To  reduce  this  requirement 
we  show  now  that  filters  with  real,  positive,  and  nega¬ 
tive  valued  transmission  characteristics  can  perform 
reasonably  well  even  for  rotation  invariant  pattern 
recognition.  It  has  been  shown8  that  the  implementa¬ 
tion  of  such  bipolar  filters  is  possible,  and  with  the 
advent  of  spatial  light  phase  modulators  the  procedure 
becomes  rather  simple.  One  major  advantage  of  work¬ 
ing  with  nonholographic  spatial  filters  is  the  in-line 
architecture  of  the  whole  optical  system. 

In  a  bipolar  filter  the  amplitude  transmittance  of 
each  filter  element  is  real  and  satisfies  the  relation 

-1  <  M;*,  <  1.  <im 

This  is  a  very  serious  constraint  on  the  equations  de¬ 
termining  these  values  [Eq.  (7)],  and  in  many  cases 
such  solutions  are  not  available.  The  only  way  to  get 
around  this  problem  is  to  relax  the  conditions  on  the 
right-hand  side  of  the  equations  and  optimize  the  solu¬ 
tions. 

To  demonstrate  the  procedure  we  implement  a  com¬ 
pletely  rotation  invariant  filter.  For  a  general  treat¬ 
ment  of  rotation  or  scale  invariant  pattern  recognition, 
it  is  useful  to  represent  the  input  pattern  in  polar 
coordinates.  We  denote  by  F{r,d)  the  complex  ampli¬ 
tude  distribution  produced  by  the  input  pattern  at  the 
filter  plane,  and  we  employ  a  circularly  symmetric 
filter.  We  divide  the  filter  plane  into  N  concentric 
rings  (where  N  is  now  the  total  number  of  divisions  as 
discussed  in  the  previous  section)  and  denote  by  M,k 
the  transmittance  (real,  positive,  or  negative)  of  the 
fcth  ring  in  the  yth  filter.  Equation  (6)  can  be  now 
rewritten  in  the  form 


Fig.  4.  Output  intensity  distribution  with  input  of  Fig.  2  and  opera¬ 
tion  with  the  filter  of  Fig.  3. 


F.k  -  FAr,n\'lirrdrdH. 


i  111 


where  integration  is  performed  over  the  area  of  the  feth 
ring  s*.  With  these  definitions  Eq.  (7)  will  be  replaced 
by 


V  f,.u ,, 


U2I 


Since  this  relation  concerns  the  absolute  values  of  each 
equation,  an  arbitrary  phase  may  be  assigned  to  render 
the  values  of  A/,*  real. 

To  test  the  viability  of  the  present  approach  some 
computer  experiments  were  performed,  and  rotation 
invariant  recognition  was  demonstrated.  One  experi¬ 
ment  involved  random  patterns  as  shown  in  Fig.  2. 
The  filter  plane  was  divided  into  sixty-four  concentric 
rings,  and  filters  were  generated  according  to  Eq.  (12). 
Figure  3  is  an  approximate  representation  of  the  rota- 
tionally  invariant  filter  made  for  one  of  the  patterns, 
while  Fig.  4  is  the  intensity  distribution  over  the  fil¬ 
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Fig.  5.  Cross  section  along  a  diameter  of  the  filter  with  removal  of 
low  frequency  components. 


Fig.  6.  Output  intensity  distribution  for  the  input  of  Fig.  2  and 
filter  of  Fig.  5  prepared  for  recognizing  the  top  pattern. 


tered  output  plane.  The  result  is  quite  noisy  in  part 
due  to  a  large  fraction  of  energy  transmitted  at  zero 
spatial  frequency  that  contains  no  information  about 
the  object.  If  this  frequency  component  is  removed  by 
a  modified  filter,  the  cross  section  of  which  is  shown  in 
Fig.  5,  the  filtered  output  shown  in  Fig.  6  is  obtained 
with  an  appreciably  enhanced  SNR. 


V.  Conclusions 

A  simplified  approach  to  optical  pattern  recognition 
was  proposed  to  make  its  practical  application  more 
feasible.  As  an  example  of  possible  implementation  of 
the  present  approach  a  recognition  criterion  was  cho¬ 
sen  so  that  the  filters  contain  information  about  the 
complete  complex  amplitude  distribution  of  the  pat¬ 
terns.  Using  computer  experiments  it  was  shown  that 
adequate  information  may  be  contained  in  bipolar  fil¬ 
ters  to  recognize  patterns  even  in  a  completely  shift 
and  rotation  invariant  manner  with  no  need  for  holo¬ 
graphic  filters.  In  a  subsequent  publication  it  will  be 
shown  that  the  approach  presented  here  can  be  em¬ 
ployed  for  different  kinds  of  filter,  i.e.,  phase  filters, 
and  patterns  of  various  nature. 

It  should  be  emphasized  that  criterion  (1)  can  never 
be  exactly  satisfied.  Further  studies  are  carried  out  to 
search  for  possibly  better  criteria  that  may  also  be 
easier  to  implement  optically. 
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Abstract 

Rotation-invariant  pattern  recognition  is  shown  to  have  intrinsic  limitations  determined  by  the  set  of 
patterns  to  be  recognized  and  by  the  specific  optical  setup.  Within  these  limitations,  a  general  procedure 
is  proposed  for  the  generation  of  bipolar  filters  that  do  not  require  the  sensitive  alignment  procedures 
involved  in  holographic  filters  and  are  suitable  for  superposition  synthesis  to  achieve  rotation  invariance. 

Introduction 

The  oldest  and  most  straight  forward  approach  to  pattern  recognition  is  image  plane  analysis  (or 
template  matching)  where  one  compares  the  image  of  the  object  with  some  stored  pattern.  If  the  object  is, 
for  example,  a  typed  page,  one  should  scan  the  page  to  locate  each  letter  and  then  compare  it  with  given 
template  letters  To  consider  the  additional  possibility  that  some  of  the  letters  may  be  rotated  we  shall 
have  to  perform  a  rotation  for  each  template  at  each  letter  position.  In  an  automatic  system  that  has  to 
perform  all  these  operations,  we  a~e  confronted  by  an  incredibly  time  consuming  task  even  for  the  most 
advanced  computers.  Therefore  it  would  be  very  useful  to  replace  the  templates  by  some  rotation-invariant 
filters  whenever  possible. 

Besides  the  problem  of  rotation,  the  main  drawback  of  image-plane  analysis  is  its  position  dependence. 
This  problem  is  resolved  by  transferring  the  image  plane  operation  to  the  Fourier-plane  where  the  whole  input 
information  is  addressed  simultaneously.  The  conventional  optical  system  perform  this  procedure  is  shown  in 
Fig.  1.  A  coherent  plane  wave  illuminating  the  input  function  f(x.y)  operates  its  Pourier  transform  (FT)  over 
plane  M  giving  rise  to  a  complex  amplitude  distribution.  F(u,v).  One  may  record  the  Intensity  distribution 
on  a  photographic  plate  to  produce  a  filter  with  amplitude  transmittance  |  f2  |  Reinserting  this  filter  into 
place  M  and  replacing  the  function  f  by  some  other  function.  g(x,y)  produces  a  complex  amplitude  distribution 
Immediately  to  the  right  of  M  given  by  G  |  F  |  2 _  where  g  is  the  FT  of  g.  An  additional  FT  performed  by  the 
second  lens  yields,  on  the  output  plane,  0,  the  triple  convolution: 

g(x.Y)  *  f  *  (-x.-Y)  •  f (x,Y)  (1) 


Fig.  1.  In-line  spatial  filtering  system:  L-Fourier  transforming  lenses  of  focal 
length  f;  f(x,y)  -  input  pattern,  0(x,y)  -  output  pattern  and  M(u,v)  - 
filter  function. 

where  *  denotes  convolution  and  f  is  the  complex  conjugate  of  f.  Expression  (1)  is  the  required  cross- 
correlation  of  f  and  g  but  this  is  convolved  with  the  function  f  that  makes  it  a  rather  poor  measure  of 
correlat  on  even  before  considering  rotations.  It  should  be  noted,  however,  that  this  kind  of  filter  is  Just 
one  possibility  For  example,  a  better  response  could  be  obtained  by  using  the  same  filter  as  an  intensity 
rllter  instead  of  amplitude  filter  with  incoherent  illumination1 

To  obtain  an  appreciable  improvement  over  the  above  considered  possibilities,  most  of  the  presently 
nIaCi1u*d„°P^1Cf 1  “,thods  for  Pattern  recognition  2  are  based  on  the  holographic  matched  filter,  first  pro- 
^  ®n  er  u2t  This  Fourier-plane  filter  has  a  high  r  solution  but  is  very  sensitive  to  misalignment. 
mAe ^  scaling  and  object  rotation  From  the  practical  point  of  view  one  seldom  needs  this  high  sensitivity 
and  the  stringent  alignment  requirements  limit  the  applicability  of  method. 
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Many  attempts  were  aade  to  render  the  VanderLugt  filter  orientation  insensitive:  Various  averaging 
aethoda  4-12  to  produce  filters  that  recognize  classes  of  objects  rather  than  specific  patterns  were  quite 
successful  and,  to  soaa  extent,  could  also  be  generalized  to  treat  a  range  of  angular  orientations  Spatial 
multiplexing  techniques13-13  are  useful  in  principle  but  not  very  practical  due  to  alignment  problea*  and.  in 
aost  cases,  the  involvement  of  mechanically  moving  components.  Photodetector  array  detection  on  the  Fourier 
plane16  and  computer  processing  is  also  possible  but  it  is  applicable  only  for  single-pattern-at-a-time  ana¬ 
lysis.  Recently  some  more  sophisticated  rotation-invariant  methods  were  proposed  where  the  spatial  filters 
are  based  on  circular  harmonic  decomposition  generated  by  digital  computers  and  recorded  holographically17'!8 
In  principle  this  approach  proved  quite  efficient  but,  unfortunately,  it  involves  elaborate  hardware  and  the 
use  of  inconvenient  components  such  as  liquid  gates  that  hinder  its  practical  application.  The  generalized 
analysi*  of  Ref.  19  may  be  helpful  to  estimate  the  response  of  filters  with  various  degress  of  rotation 
invariance  to  a  specific  input  but  this  should  be  augmented  with  some  derivation  of  filter  selectivity  to 
different  Inputs. 


In  this  work  we  address  first  the  general  question  of  the  limitations  imposed  on  rotation  invariant  pat¬ 
tern  recognition  by  the  intrinsic  nature  of  optical  methods.  It  will  be  indicated  that  the  answer  depends  on 
the  specific  patterns  to  be  recognized  and  on  the  actual  procedures  applied.  With  these  limitations  kept  in 
mind  we  propose  a  new  approach  to  the  synthesis  of  filters.  This  approach  should  be  straightforward  to 
implement  and  eary  to  use  in  practice. 


II.  Some  Limitations  On  Rotation-Invariant  Pattern  Recognition 


Addressing  the  general  question  of  rotation-invariant  pattern  recognition,  we  use  polar  coordinates  to 
represent  the  input  pattern,  f(r,9).  At  this  state,  f  represents  the  complex  amplitude  distribution  produced 
by  the  input  pattern  at  the  filter  plane  where  we  insert  a  filter  with  amplitude  trasmittance  m(r,0)  This 
plane  may  be  either  the  image  plane  or  the  Fourier  plane,  whichever  is  more  convenient  for  a  certain  applica¬ 
tion.  As  pointed  out  in  Ref.  19  there  are  a  number  of  ways  to  define  the  performance  of  a  filter.  One  of 
these  possibilities  is  the  integral  detection  of  all  the  light  arriving  at  the  output  plane.  Using  the  prin¬ 
ciple  of  energy  conservation  this  integral  quantity  is  given  by  the  total  power  transmitted  by  the  filter: 


R  ( 0 ) 


2  IT  r. 

■If 


m(r,9)f (r,9) 


rdrdfl 


(2) 


0  0 


where  rB  .represents  the  size  of  the  filter  assumed  circular.  To  investigate  the  response  for  rotated  objects 
we  may  keep  the  object  constant  and  rotate  the  filter  assuming  that  all  the  rest  of  the  system  is  circularly 
symmetric.  The  response  with  the  filter  rotated  into  the  0O  orientation  say  be  described  by  the  relation. 


0  0 


m(r.0  -  fl0)  |  2  |  f(r.fl)  |  2  rdrdfl 


(2a) 


If  we  want  to  make  this  response  rotation- invariant  we  have  to  require. 


30O 


which  leads  to  the  obvious  result  that  j  a  )  should  be  independent  of  the  angular  coordinate.  9  apart  from  a 
phase  variation  that  say  change  the  output  distribution  but  not  its  integral  power.  Thus,  to  generate  a 
rotation-invariant  filter  one  has  to  apply  some  amplitude  averaging  procedures  over  the  angular  coordinate. 

The  most  general  result  from  these  considerations  is  that  rotation  invariant  pattern  distinction  is 
possible  only  among  patterns  the  angular  average  of  which  differ  from  each  other  at  the  filter  plane  Later 
it  will  be  indicated  that  the  response  of  Eq.  2  is  not  very  discriminant  detection.  Nevertheless,  the 
conclusion  regarding  the  angular  Independence  of  |  a  j  for  rotation  invariance  is  quite  general  but  in  most 
cases  restrictions  may  arise  also  for  the  phase  variation. 

It  Is  very  useful  to  note  here  that  the  same  class  of  patterns  that  is  suitable  for  rotat ion- invar iant 
recognition  in  the  image  plane  may  be  impractical  for  rotation  invariant  recognition  in  the  Fourier  plane. 
Difficulties  that  may  occur  can  be  illustrated  by  conslderalng  the  simple  block  characters  as  shown  in  Fig 
2.  Assuming  that  the  lines  composing  the  characters  are  transparent  while  all  the  rest  is  opaque,  we  rotate 
each  character  around  its  center  of  mass  and  record  the  transmitted  intensity  in  the  image  plane.  Many  of 
the  patterns  m(r)  generated  this  way  will  have  different  features  characteristics  of  the  original  letter. 
Thus,  in  principle,  these  masks  may  serve  as  some  crude  rotation  Invariant  recognition  "templates''  for  the 
set  of  characters.  This  may  not  be  the  case  if  we  convert  to  Fourier  plane  analysis. 
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Figure  2.  A  sample  of  characters  for  recognition. 

The  aost  general  result  from  these  considerations  Is  that  rotation  invariant  pattern  distinction  Is 
possible  only  among  patterns  the  angular  average  of  which  djffer  from  each  other  at  the  filter  plane.  Later 
it  will  be  indicated  that  the  response  of  Eq.  2  is  not  very  discriminant  detection.  Nevertheless,  the  conclu¬ 
sion  regarding  the  angular  independence  of  |  a  |  for  rotation  Invariance  is  quite  genera]  but  in  most  cases 
restrictions  may  arise  also  for  the  phase  variation. 

It  is  very  useful  to  note  here  that  the  same  class  of  patterns  that  is  suitable  fcr  rotation-invariant 
recognition  in  the  image  plane  may  be  impractical  for  rotation  invariant  recognition  in  the  Fourier  plane. 
Difficulties  that  may  occur  can  be  illustrated  by  considering  the  simple  block  characters  as  3hown  in  Fig. 

2.  Assuming  that  the  lines  composing  the  characters  are  transparent  while  all  the  rest  is  opaque,  we  rotate 
each  character  around  its  center  of  mass  and  record  the  transmitted  intensity  in  the  image  plane.  Many  of 
the  patterns  m(r)  generated  this  way  will  have  different  features  characteristics  of  the  original  letter. 

Thus,  in  principle,  these  masks  may  serve  as  some  crude  rotation  Invariant  recognition  "templates"  for  the 
set  of  characters.  This  may  not  be  the  case  if  we  convert  to  Fourier  plane  analysis. 

Figure  3  shows  the  optically  generated  FT  of  the  characters  in  Fig.  2.  The  highest  and  most  intense 
spatial  frequency  components  are  generated  by  the  lines  constructing  the  characters.  The  absolute  magnitude 
of  these  components  is  almost  identical  for  all  the  characters,  the  distinguishing  feature  being  only  in  the 
orientation  of  the  FT  produced  by  the  various  line  segments.  At  first  sight  it  would  appear  that  a  rota- 
tionally  invariant  mask  placed  in  this  plane  will  have  difficulties  in  distinguishing  among  these  unless  it 
can  resolve  minute  differences  due  to  various  lengths  of  the  line  segments.  It  can  be  seen,  however,  from 
the  rotational  averages  whown  in  Fig.  4  that  appreciable  differences  still  exist  but  they  will  decrease  as 
the  line  segments  get  longer  when  compared  to  their  width.  Stated  in  a  more  general  way,  the  class  of  pat¬ 
terns  generated  by  narrow  lines  (such  as  line  drawings)  where  the  line-width  is  the  minimum  feature  size  with 
all  other  features  having  much  larger  dimensions,  is  not  suitable  for  rotation-invariant  pattern  recognition 
in  Fourier  plane  procedures. 


clgure  3.  Optical  '.miner  transform  of  characters  from  Fig.  2. 
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Figure  4.  Rotation  averages  of  transforms  in  Figure  3. 


III.  Bipolar  Filters  For  Pattern  Recognition 


The  two  extreme  approaches  for  the  generation  of  spatial  filters  for  pattern  recognition,  l.e.  the  power 
spectrum  filter  and  the  holographic  matched  filter,  were  discussed  In  the  introduction.  Observing  the  pre¬ 
sent  state  of  art  it  appears  that  the  performance  of  the  first  kind  is  too  poor  while  the  applications  of 
holographic  filters  are  tedious  and  frequently  require  quite  sophisticated  hardware  and  software  In  this 
section  we  introduce  a  modified  approach  to  filter  synthesis  that  should  be  simple  to  implement  and  to  use. 

To  achieve  this  goal  we  consider  the  following  requirements:  a)  The  spatial  filter  should  be  less  sensitive 

than  a  holographic  filter  but  should  have  a  better  performance  than  the  simple  power  spectrum  filter:  b)  The 
pattern  recognition  system  should  give  a  high  response  for  a  specific  input  together  with  a  negligible  output 
for  any  other  pattern  included  in  a  given  set;  c)  Since  we  are  interested  in  rotation-invariant  pattern 
recognition  the  proposed  method  should  be  applicable  for  this  purpose  too.  One  way  to  meet  requirement  (a) 
is  by  the  synthesis  of  a  medium  resolution  filter  that  discards  most  of  the  phase  information  retaining  some 
of  it  In  a  bipolar  form.  The  filter  will  also  conform  with  requirement  (b)  if  its  generation  will  take  into 
consideration  all  the  patterns  to  be  analyzed.  In  the  following  we  propose  a  number  of  variations  to  the 
implementation  of  such  filters  and  convert  them  into  rotation-invariant  filters  in  the  next  section. 

Bipolar  pattern  recognition  Is  usualy  considered  in  connection  with  incoherent  i 1 luminat ion* • ^0 
However  in  the  present  procedure  we  start  from  general  considerations  that  are  applicable  to  incoherent  as 
well  as  coherent  illumination,  either  for  processing  performed  in  the  image  plane  or  the  Fourier  plane.  To 
keep  this  work  within  reasonable  limits  we  shall  mainly  address  Fourier  plane  processing  with  coherent 
i  1 lumination . 


In  Refs.  4-8  various  mathematical  procedures  were  Investigated  for  the  synthesis  of  composite  matched  spa¬ 
tial  filters.  Adopting  a  similar  approach  we  rely  on  the  fact  that  the  procedure  has  been  proven  to  be 
mathematically  sound  and  simplify  the  derivation  by  avoiding  steps  such  as  decomposition  into  sets  of 
orthor-normal  functions  Consider  a  set  of  patterns,  fjlx.y),  (1  -  1.2,.  N)  to  be  discriminated  against 
each  other  We  form  their  2-dimensional  FT,  Fj  (u.v)  and  wish  to  manufacture  a  set  of  filters  Mj  (u.v).  (J 
»  1.2,  .  N)  in  such  a  way  that  they  will  transmit  light  only  if  illuminated  by  a  specific  pattern. 

Mathematically,  this  requlremwent  may  be  expressed  by  the  relation. 


( u ,  v ) 


Mj(u.v)  dudv 


<fi 


U 


where  we  assume  Mj  to  be  a  complex  amplitude  transmission  function.  Unfortunately  this  is  a  paradoxical 
requirement  since  we  deal  with  positive  definite  integrands.  Therefore,  any  |  Fj  |  t  0  requires  t  Mj  |  *  0 

for  all  1  $J  resulting  with,  at  the  most,  one  filter  that  transmits  light  Naturally,  such  a  criterion 
cannot  lead  to  a  selective  set  of  filters  indicating  that  the  response  function  of  Eq.  (2)  is  not  a  good 
choice  for  our  purposes.  To  improve  performance  we  take  one  step  towards  a  holographic  matched  filter  (the 
main  function  of  which  is  a  certain  redistribution  of  the  power  over  the  output  space)  and  try  a  bipolar  set 
of  filters  with  the  requirement. 


Pl(u,v)  Mj(u.v)  dudv 


<flj 


(4) 
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One  nay  to  synthesize  a  set  of  real  filters  that  satisfy  this  relation  is  by  the  linear  superposition  (a 
coaposite  filter) . 

MJ  *  akJ  I  Fk<u-V>  I  (5) 

where  a.  ,  are  real  (positive  or  negative)  constants  and  summation  over  identical  indices  is  postulated. 
Although  the  functions  |  Fj  (u.v)  j  do  not  constitute  an  orthonoraal  set  one  nay  still  substitute  Eq.  (3) 
into  Eq.  (4)  and  solve  the  following  set  of  lequatlons, 

8kjd  //l  F1  (UlV)  II  Fk(u.v)|  dudv  ■*  cfij  (6 

to  evaluate  akj . 


Defering  to  a  later  stage  the  discussion  of  sone  difficulties  involved  in  this  procedure,  we  proceed  to 
investigate  the  perforaance  of  the  systea  asuning  that  we  possess  a  filter  set  described  by  the  charac¬ 
teristics  indicated  in  the  above  relations.  Inserting  one  of  the  filters.  M.,  into  the  Fourier  plane,  M  of 
Fig.  1,  we  illualnate  it  with  pattern  f  ,  placed  in  the  input  plane.  For  convenience,  we  write  the  FT  of 

f  in  the  form, 
n 

Fn(u,v)  -  |  Fn(u,v)  |  expi<Fn(u, v) 

(7) 

where  4»n  (u,v)  is  a  real  phase  function.  The  complex  amplitude  distribution  Eq.  (7)  is  transmitted  by  the 
filter  and  transformed  by  the  second  lens  to  produce  the  output  distribution. 

O(x.y)  -,2F<FnMJ>-  ak«gf<|  FR  |  |  Fk  |  exp  i*n>  (8) 

where^  represents  FT.  In  most  cases  of  interest  here,  where  we  shall  deal  with  real  input  patterns,  the 
main  contribution  of  the  phase  function  will  be  a  translation  of  the  output  pattern  to  a  location 
corresponding  to  its  position  on  the  input  place.  Thus  it  is  useful  to  consider  alternative  form: 

°(*.y>  -  I  Fn  I  I  Fk  I  1  1  ♦„)  (9) 

since  convolution  is  a  linear  process,  one  may  perform  the  summation  on  k  first  and  the  convolution  after¬ 
wards.  The  summed  function  reduces  at  the  origin  to  Eq.  (6),  Thus  one  would  expect  a  strong  correlation 
or  n  »  with  sone  weak  and  blurred  responses  for  all  other  input  patterns  produced  by  some  contribution 
away  from  the  origin.  The  convolution  with  the  phase  function  will  Introduce  a  partial  reconstruction  of 
the  object  and  position  it  according  to  its  location  in  the  input  plane. 

IV.  Rotation-Invariant  Spatial  Filters 


In  the  following,  we  restrict  ourselves  to  sets  of  patterns  that  are.  in  principle,  suitable  for  rotation 
invariant  recognition  and  use  a  procedure  similar  to  the  previous  one. 

We  start  by  representing  the  functions,  F^  in  a  polar  coordinate  system, 


Fi<x.V)  *  Fj  (r.O) 

and  define  a  set  of  positive,  real,  normalized  and  rotation-invariant  functions, 


I  Fj(r.fl)  |  d9 


|  Fj(r,8)  |  2rdrdfl 

According  to  our  discussion  on  the  limitations  with  rotation  invariant  pattern  recognition  we  would 
expect  that  discrimination  will  be  possible  If  these  functions  differ  "appreciably"  among  themselves.  The 
amount  of  difference  implied  by  the  word  "appreciably”  depends  on  the  actual  experimental  systems.  The  dif¬ 
ferences  may  be  very  minute  for  computer  simmuJation  with  arbitrary  accuracy  but  should  be  much  larger  for  a 
real  systea.  We  shall  return  to  this  point  at  the  end  of  this  section. 
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Following  our  previous  procedure,  we  search  for  a  set  of  filters  that  satisfy  the  orthogonality 
relation. 


f 


(r)2itrdr 


(12) 


Here  too,  we  shall  generate  these  filters  as  linear  coablnation  of  the  input  patterns: 


Mj (r )  -  a^Fj  (r )  (13) 

Substitution  into  Eq.  (12)  leads  again  to  X  linear  algebraic  equations 

akJ  J"£i(r)£i,(r)2Jrrdr  »  (Tjj  (14) 

To  lapleaent  this  filter  set,  one  should  deteraine  the  set  Fj ,  solve  the  linear  equations  for  ak,and 
construct  the  filters  accordingly. 

Since  the  set  Fj  is  positive  definite,  the  solution  of  Eq.  (14)  will  lead  to  positive  and  negative 
values  for  ak, .  As  aentloned  earlier,  we  gave  up  the  coaplete  phase  dependence  of  the  holographic  filter 
but  we  still  require  bipolar  values. 

There  are  a  nuaber  of  possible  approaches  for  the  practical  construction  of  the  above  described  filters 
and  here  we  proceed  to  describe  one  that  aay  be  called  a  quantized  filter.  Recalling  that  a  ro'-’tion- 
invarlant  filter  should  be  circularly  syaaetric,  we  divide  the  Fourier  plane  into  X  concentric  rings.  (The 
spacing  of  these  rings  and  the  nuaber  X  will  be  discussed  in  the  next  section).  We  now  represent  each  of 
the  N  normalized  functions,  F^(r)  (Eq.  11)  by  a  vector  P,,  with  components  p,,,  proportional  to  the  square 
root  of  the  total  power  incident  on  the  J-th  ring.  The  X  circularly  syaaetric  filters  will  be  also  repre¬ 
sented  by  vectors  in  X-space.  M.  .  with  components  akj,  that  are  the  amplitude  transmittance  of  the  j-th  ring 
in  the  k-th  filter.  This  circular  filters  will  attain  its  proper  function  if  we  require  again  the  orthogo¬ 
nality  relation, 

PlJ*kJ  "  cfik  (lal 


This  equation  constitutes  X2  linear  equations  for  the  determination  of  the  X  elements  of  the  matrix 
a...  Since  we  are  dealing  here  with  a  matrix  it  will  be  convenient  to  put  the  whole  procedure  into  a  matrix 
fora:  We  construct  the  two  matrices. 

pll  pl2  ...  plX 

p21  p22  . . .  p2N 


P  * 
and 


pnl  . PNN 

*11  *21  •  * 


(  16) 


al2  a22  ...  XI 


(17) 


“IX 


*xx 


and  write  Eq.  (IS)  in  the  convenient  fora, 

P  N  •  I 


(  18) 
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where  I  is  the  unit  matrix  Thus  we  see  that  the  calculation  of  mir  filter  reduces  to  a  simple  * rivers*  on  of 
a  matrix  derived  from  measurement  during  the  .earning  stage 

M  »  p'1  -  ha  < 

Equations  (6)  and  (14)  may  be  also  put  in  this  matrix  form  by  replacing  m.  ami  defining. 

it  J 

pik  ■//>  Fj  (u.v)  ||  Fg ( u , v )  dudv  (  :  u  i 

for  Eq.  (6)  and  similarly  for  Eq.  (14).  using  the  normalized  functions  It  is  interesting  to  note  that  Eq 
(18)  represents  a  symmetric  matrix  which  is  not  necessarily  the  case  for  the  originai  matrix  defined  in 
Eq.  (15). 

The  limitations  in  the  implementation  of  rotation- invariant  filters  discussed  in  section  ;i  are  impii 
citly  included  in  eq.  (18a)  and  we  may  return  to  the  phrase,  "appreciably  different"  mentioned  at  the 
beginning  of  this  section.  It  is  immediately  evident  that  if  we  have  two  identical  patterns  the  matrix  can 
not  be  inverted.  However,  if  there  are  two  patterns  that  differ  only  slightly,  a  solution  may  be  obtainei! 
but  with  |  m  J  The  result  is  that  equations  (in  EQ .  15)  having  zero  on  their  right  hand  side  will  no: 

their  rightJnand  side  will  not  be  affected  but.  unfortunately  many  of  the  values  unity  will  have  to  be 
divided  by  the  values  unuity  will  have  to  be  divided  1  we  shall  have  to  deal  with  an  autcn.-orrelar ion  repre¬ 
sented  by  a  small  fraction  1  !  m^  J  ^^^that  may  be  undetectable.  For  any  practical  system  this  number 

will  determine  the  limits  imposed  on  recognition  possibilities. 

V.  Filter-Place  Division 

In  the  previous  section  the  filter  plane  ws  divided  into  concentric  rings  without  specifying  their 
widths.  The  reason  is  that  the  optimal  division  actually  depends  on  the  class  of  patterns  :u  be  recognized 
In  principle  one  may  use  an  arbit-ary  function  to  ue-  ve  : he  -ailias  ,  of  the  n-*h  circ'e  in  the  filter 
plane. 

rn  *  h(n) 

An  interesting  and  simple  class  of  these  functions  can  be  written  in  the  form. 

h(n)  ■  rjnq  !2:  1 

The  special  case  of  q  =  1/2  is  the  Fresnel  Zone  division  where  all  the  rings  have  the  same  area  while 
the  case  q  =  -1/2  may  be  termed  the  "Inverse  Fresnel  Zone  plate"  (i  e  the  n-th  radius  of  the  Fresnel  zone 
plate  multiplied  by  the  n-th  radius  of  the  inverse  Fresnel  zone  plate  is  a  constant  for  all  n).  These  two 
kinds  of  division  complement  each  other  with  respect  to  the  nature  of  patterns  to  be  discriminated  The 
first  kind  of  division  has  rings  that  become  very  narrow  for  high  spatial  frequency  values  thus  making  it  a 
good  rotation-invariant  filter  for  patterns  having  their  important  features  at  high  frequencies 
Conversely,  the  second  choice  will  be  suitable  for  filtering  information  at  low  spatial  frequences  An 

Intermediate  case  may  be  treated  with  filters  having  q  -  1  where  the  width  of  the  rings  is  coristan*  This 

analysis  is  reminiscent  of  the  procedures  utilized  in  ref  16  where  a  specific  circular  harmonic  was  chosen 
for  each  recognition  task  depending  on  the  objects  to  be  dealt  with 

The  number  of  the  rings  in  each  filter  may  also  be  chosen  in  a  flexible  way  utilizing  optimization 
algorithms  However,  to  avoid  unnecessary  complications  at  this  stage  we  made  the  number  of  rings  equal  to 
the  number  of  patterns  that  makes  the  solution  of  the  equations  (15)  unique  Other  implications  of  this 
subject  will  be  addressed  in  a  subsequent  work. 

VI  Discuss  ion 


A  simplified  approach  to  optica]  pattern  recognition  was  proposed  to  make  its  practical  application  more 
feasible  Emphasis  was  placed  on  rotat  ion- invar  lant  pattern  recognition  and  its  intrinsic  1  -.mit.it  ions 
Some  of  the  aspects  treated  have  quite  general  implications  For  example,  .t  was  shown  that  integral 
transmission  detection  is  a  poor  measure  for  pattern  distinction.  Therefore  the  present  procedure.  like 
holographic  matched  filtering,  relies  on  the  intensity  distribution  over  the  output  plane  Further  research 
is  required  for  the  determination  of  the  actual  influence  of  various  parameters  such  as  information  content 
and  possible  phase  variation  in  the  filter 


The  extension  of  the  present  method  for  class  discrimination  is,  in  principle,  a  straight  forward  pro¬ 
cedure  For  example,  to  implement  a  mask  that  determines  whether  a  certain  pattern  belongs  to  a  subset  (A) 
one  might  superpose  all  the  mask  vectors  belonging  to  that  subset 


M 


a 


M 


a  1 


M 


a2 


122) 


Although  mathematically  this  relation  is  quite  simple,  one  should  keep  in  mind  the  need  for  filter  nor¬ 
malization  required  by  the  physical  limitations 


266  /  SPIE  Vo i.  613  NonUnoor  Optics  ond  Applications  11986) 


tn  a  subsequent  publication  the  subject  of  scale-invariant  pattern  recognition  Mill  be  addressed  using 
a  slallar  approach. 

It  is  a  pleasure  to  thank  G.  Daniels  for  perforalng  the  photographic  work  Involved  in  this 
investigation. 
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Circular  harmonic  phase  filters  for  efficient  rotation- 
invariant  pattern  recognition 


Joseph  Rosen  and  Joseph  Shamir 


A  generalized  approach  for  pattern  recognition  using  spatial  filters  with  reduced  tolerance  requirements  was 
described  in  some  recent  publications.  This  approach  leads  to  various  possible  implementations  such  as  the 
composite  matched  filter,  the  circular  harmonic  matched  filter,  or  the  composite  circular  harmonic  matched 
filter.  The  present  work  describes  new  examples  leading  to  very  high  selectivity  filters  retaining  rotation 
invariance  and  reduced  requirements  on  device  resolution.  Computer  simulations  and  laboratory  experi¬ 
ments  sh^w  the  advantages  of  this  approach. 


I.  Introduction 

Conventional  methods  of  optical  pattern  recogni¬ 
tion  suffer  from  the  requirement  of  high  resolution 
recording  materials  and  distortion  sensitivity.  In 
some  recent  publications1-2  a  new,  general  procedure 
was  introduced  that  may  be  employed  for  generating 
spatial  filters  with  reduced  resolution  requirements. 
Partial  and  complete  rotation- invariance  was  demon¬ 
strated  in  computer  simulations  and  laboratory  ex¬ 
periments  employing  bipolar  amplitude  filters,  phase- 
only  filters,  and  composite  phase  filters. 

In  this  work  we  show  that  a  good  example  of  the  new 
procedure  is  the  circular  harmonic  component  filter  in 
its  regular  complex  amplitude  form  and  also  in  its 
phase-only  form.  These  filters  can  be  used  as  the 
basic  constituents  in  a  composite  filter  where  the  ad¬ 
vantages  of  phase-only  filters  and  complex  amplitude 
filters  are  combined.  The  initial  goal  of  our  research 
project,1  i.e.,  the  use  of  reduced  information  content 
filters  is  preserved  together  with  a  high  degree  of  dis¬ 
tortion  invariance.  In  this  paper  we  demonstrate  ro¬ 
tation  invariance  only  but  preliminary  experiments 
indicate  that  scale  invariance  can  be  approached  with  a 
similar  procedure. 
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II.  Rotation-Invariant  Filter  Design 

Our  objective  is  to  find  an  efficient  filter,  deter¬ 
mined  by  a  characteristic  function  £(*,>•),  that  can 
recognize  a  pattern  f(x,y)  in  the  presence  of  other 
patterns.  The  recognition  criterion  will  use  the  con¬ 
ventional  correlation  function 

C(x0,y0)  -  /  fix,y)g’[x  -  x0,y  -  y0)dxdy.  ( 1 1 

and  in  particular  its  value  at  the  origin 

C(0)  =  I  I*' f(r,9)g‘(r,ff)rd0dr.  (2) 

Jo  Jo 

where  we  converted  to  polar  coordinates  for  conve¬ 
nience  in  treating  the  subjects  of  rotation  and  scale 
invariance.  Defining  this  equation  as  the  system  re¬ 
sponse  one  may  also  define  the  response  for  an  object 
rotated  by  an  angle  a, 

C(0;a)  *  I  |  fir, 6  +  aig*(r,d)rd$dr.  (3) 

Jo  Jo 

Ideally  one  would  like  to  keep  C(0;a)  constant  regard¬ 
less  of  the  value  of  a.  However,  since  this  requirement 
is  usually  beyond  practical  limits  one  has  to  look  for 
various  compromises.  For  example,  the  performance 
of  circular  harmonic  component  filters  has  been  inves¬ 
tigated  for  completely  rotation-invariant  pattern  rec¬ 
ognition  by  Arsenault  and  Sheng.:1  A  filter  made  for  a 
single  circular  harmonic  component  yields  a  correla¬ 
tion 

C(0:a)  =  K  expL/nal.  < 4 > 

where  K  is  a  constant  and  n  is  the  order  of  the  harmon¬ 
ic.  For  intensity  measurements  this  response  is  quite 
satisfactory. 

In  the  present  approach  we  turn  around  the  argu¬ 
ment  and  start  by  defining  the  required  response. 


15  July  1988  /  Voi.  27,  No.  14  /  APPLIED  OPTICS 


2695 


C(0;a).  Considering  this  response  as  a  function  of  the 
variable  a  it  can  be  decomposed  into  a  Fourier  series: 

CI0;a)  *  V  c,  expO'na).  (5) 

Working  in  the  Fourier  plane  it  is  useful  to  represent 
the  Fourier  transform  of  the  input  patterns  and  the 
characteristic  Filter  functions  in  a  circular  harmonic 
decomposition: 

F(p,<t>)  «  V  Fn(p)  exp(jrut>),  (6) 


G(p,i>)  »  Y  G„(p)  expOn«),  (7) 

where  p  and  <t>  are  the  polar  coordinates  in  the  Fourier 
plane.  It  is  easy  to  show  that  the  value  of  the  correla¬ 
tion  function  at  the  origin  [Eq.  (3)]  can  also  be  written 
in  the  simple  form 

C(0;a)  *  f  f  F{p,<t>  +  a)G'(p,i>)pdpd<t>.  (8) 

Jo  Jo 

Comparing  this  with  Eq.  (5)  and  using  the  orthogonali¬ 
ty  of  the  exponentials  we  obtain 

y  c„  expO’no)  ■  V  [  F„(p)G‘(p)  exp(jna)pdp,  (9) 

Jo 

or 

cn  *  (  F„(p)G'n(p)pdp.  (10) 

Jo 

Following  the  traditional  way  of  matching  a  certain 
circular  harmonic  component  filter  to  the  circular  har¬ 
monic  component  in  the  object  one  may  do  the  same  in 
the  Fourier  plane  by  taking  G„(p)  =  F„(p).  This  filter, 
however,  does  not  take  into  consideration  the  fact  that 
the  energy  content  in  each  harmonic  component  is  very 
object  dependent  causing  an  appreciable  reduction  in 
light  efficiency  and  filter  selectivity.  To  remedy  this 
drawback  we  may  introduce  a  weighting  factor  into 
each  characteristic  filter  function.  Also,  recalling  the 
high  efficiency  and  selectivity  obtained  with  phase- 
only  filters4  5  one  is  tempted  to  use  the  phase  informa¬ 
tion  as  the  major  contributor  for  generating  the  filters. 
Thus  we  define  the  phase-only  characteristic  circular 
harmonic  functions, 

|  F(p,i> )  expjjntfijd® 


jj  Fip.ti)  expOn*)d<i| 

where  p*  is  the  size  of  the  filter  and  the  indistinguish¬ 
able  low  frequency  signal  has  been  eliminated  (i.e.,  G„ 
=  0  outside  the  noted  region).  The  useful  interval 
depends  on  the  pattern  to  be  recognized  and  should  be 
chosen  in  such  a  way  that  it  contains  the  distinguishing 
information. 

The  most  convenient  way  to  proceed  is  to  invoke  a 
specific  example.  Previous  experiments  with  block 
letters  indicated  that  it  is  most  difficult  to  distinguish 


Fig.  1 .  Input  pattern  for  the  computer  experiments  from  which  the 
letter  P  should  be  recognized. 


between  the  letters  P  and  F  such  as  shown  in  Fig.  1. 
Thus  it  is  interesting  to  investigate  this  difficult  case 
with  various  filters  made  to  recognize  one  of  these 
letters  against  the  other.  In  a  computer  experiment 
filters  were  generated  to  recognize  the  letter  P  from  the 
input  pattern  of  Fig.  1.  The  performance  with  a  regu¬ 
lar  matched  filter  is  shown  in  Fig.  2(a)  with  the  auto¬ 
correlation  peak  normalized  to  unity.  It  is  clear  that 
the  cross  correlation  with  F  is  quite  high,  much  higher 
than  the  correlation  with  the  rotated  P.  The  autocor¬ 
relation  peak  of  a  phase-only  matched  filter  is  54  times 
as  high  [Fig.  2(b)]  but  the  cross  correlation  with  F  is 
high  too,  again  much  higher  than  that  with  the  rotated 
P.  A  circular  harmonic  component  with  n  -  0  pro¬ 
duces  the  output  pattern  shown  in  Fig.  2(c),  demon¬ 
strating  complete  rotation  invariance  but  not  very 
good  selectivity.  The  improvement  obtained  by  using 
a  phase-only  circular  harmonic  component  filter  of  the 
type  represented  by  Eq.  (11)  is  indicated  by  Fig.  2(d). 
Low  frequency  suppression  for  the  two  last  experi¬ 
ments  was  the  same. 

The  experimental  results  shown  in  Fig.  2  are,  respec¬ 
tively,  summarized  in  lines  1-4  of  Table  I.  Ip  is  the 
autocorrelation  peak  intensity  normalized  to  1  for  the 
classical  matched  filter  while  Ip/Ip  is  the  ratio  between 
the  peak  for  P  to  that  tor  F.  The  last  column  indicates 
if  the  filter  is  completely  rotation  invariant  or  not. 

lit.  Phase  Amplitude  Composite  Filter  Generation 

The  good  performance  of  the  new  filter  is  still  deteri¬ 
orated  by  the  presence  of  a  cross-correlation  peak.  To 
suppress  this  peak  one  must  also  include  in  the  filter 
func*ion  some  information  about  the  pattern  to  be 
rejected.  This  can  be  achieved  by  using  the  concept  of 
the  composite  filter6  as  also  implemented  for  the  circu¬ 
lar  harmonic  filters.7  Figure  3  is  the  output  pattern 
obtained  by  using  such  a  rotation-invariant  complex 
amplitude  filter  (see  also  line  5  in  Table  I).  In  princi¬ 
ple  one  could  use  the  same  procedure  with  the  new 
phase  filters;  however,  due  to  the  rapid  fluctuations  of 
the  intensity  over  the  output  plane  this  is  too  difficult. 
Thus  to  suppress  the  cross-correlation  peaks  one  may 
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Fig.  2.  Output  distribution  for  (a)  regular  matched  filter;  (b)  phase-only  matched  filter;  (c)  harmonic  component  (n  «  0)  filter;  and  (d) 

harmonic  component  (n  »  0)  phase-only  filter. 


Ta««  I.  Performance  Comparison  for  the  Various  Filters  (Saa  Test  for 
Dataflt);  Parameters  a,  and  r,  Define  the  Weight  of  Each  Component  of 
_ «!*•  Composite  Filters 


Filter  /, 

°  “  IClO.al2  lp/Ir 

Rotation 

invariant 

(1)  Matched  filter 

1 

1.42 

(2)  Phase-only  filter 

54 

2.8 

No 

(3)  Circular  harmonic  component 

0.44 

1.7 

Yes 

filter  N  •  0 

(4)  Phase-only  circular  harmonic 

7.5 

3.5 

Yes 

component  filter  N  ■  0 

(5)  Composite  filter  -  u,f.,  +  u2 P1 

0.55 

2.0 

Yes 

(6)  Composite  filter  ■  u,F,  +  u,F, 

3 

5.5 

Yes 
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Fig.  3.  Output  distribution  with  a  harmonic  component  composite 
filter. 

use  a  smother  characteristic  function  for  the  unwanted 
patterns  (F  in  this  case)  in  a  composite  filter.  One 
possible  choice  can  be  the  circular  harmonic  compo¬ 
nent  filter  employed  in  generating  the  output  of  Fig. 
2(c).  This  way  we  may  compose  a  filter  where  we 
utilize  the  high  light  efficiency  of  phase-only  filters  for 
the  pattern  to  be  recognized  and  modify  it  with  the 
complex  functions  of  the  patterns  to  be  rejected. 

With  the  above  considerations  in  mind  we  generate 
the  characteristic  filter  function  for  P  according  to  Eq. 
(11)  for  the  n  -  0  circular  harmonic.  For  the  same 
circular  harmonic  component  we  generate  the  circular 
harmonic  filter  for  F  according  to  the  relation 
f2' 

Gf(p)  -  Fp(p,<ti)d<t> ,  (12) 

Jo 

and  combine  them  in  a  composite  filter. 

A  scan  along  the  diagonal  of  the  filter  is  shown  in  Fig. 
4.  It  turns  out  that  for  real  objects,  as  is  the  situation 
in  our  experiments,  the  zero-order  phase-only  circular 
harmonic  has  only  the  values  zero  or  it  leading  to  a 
binary,  bipolar  amplitude  filter  with  values  1  and  -1. 
The  plot  in  Fig.  4  represents  such  a  filter  made  for  P, 
modified  by  the  complex  filter  function  prepared  for 
the  zero-order  circular  harmonic  of  the  letter  F.  The 
output  pattern  for  this  filter  is  shown  in  Fig.  5. 

The  measurements  performed  on  the  outputs  of  Fig. 
5  are  summarized  in  line  6  of  Table  I.  While  line  5 
represents  the  results  for  a  filter  composed  of  two 
characteristic  functions  that  served  as  filters  in  line  3, 
the  filter  for  line  6  is  made  out  of  a  P  function  corre¬ 
sponding  to  the  filter  in  line  4  combined  with  an  F 
function  corresponding  to  a  filter  of  line  3.  The  im¬ 
proved  discrimination  characteristic  of  the  new  com¬ 
posite  filter  compared  to  Figs.  3  and  2(d)  is  evident. 

IV.  Laboratory  Experiments 

To  verify  the  practicability  of  the  new  procedure  the 
computer  experiments  were  repeated  in  the  laborato¬ 
ry.  We  employed  the  same  IBM  PC  that  was  used  in 
the  simulations  to  generate  the  input  pattern  of  Fig. 
6(a)  and  holographic  filter  functions  like  the  one 
shown  in  Fig.  6(b).  To  generate  the  filters  the  Fourier 
plane  was  sampled  into  64  rings  of  equai  width  and  the 
holograms  were  plotted  on  a  regular  dot  printer.  The 
working  patterns  were  obtained  by  a  25-fold  photo¬ 
graphic  reduction  onto  a  regular  photographic’  film. 
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Fig.  4.  Bipolar  amplitude  scan  along  one  diameter  of  a  phase  am¬ 
plitude  harmonic  component  filter. 
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Fig.  5.  Output  pattern  for  the  filter  of  Fig.  4. 


Figure  6(c)  shows  the  output  pattern  for  a  phase-only 
circular  harmonic  component  filter  (corresponding  to 
line  4  in  Table  I)  superposed  by  a  line  along  which  the 
intensity  scan  of  Fig.  6(d)  was  obtained.  Figures  6(e) 
and  (f)  are  the  corresponding  patterns  for  the  compos¬ 
ite  filter  of  Fig.  4  (line  6  in  the  table).  The  correspon¬ 
dence  with  the  computer  calculations  is  excellent. 
Note  that  the  correlation  peaks  appear  at  the  centroids 
of  the  recognized  patterns  that  are  shifted  during  rota¬ 
tion.  It  is  interesting  that  the  cross  correlation  with 
the  additional  letter  0  was  also  reduced  with  the  com¬ 
posite  filter. 
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Fig.  6.  (a)  Input  pattern  for  laboratory  experiment;  (b)  filter  made 
for  recognizing  P\  (cl  output  pattern  with  phase-only  harmonic 
component  filter  superposed  by  a  line  along  which  the  intensity  scan 
of  (d)  was  taken;  (e)  output  for  a  phase-amplitude  harmonic  compo¬ 
nent  composite  filter  of  Fig.  4  with  intensity  scan  (f). 


V.  Conclusions 

In  this  work  we  introduced  a  new  kind  of  phase-only 
filter,  the  phase-only  circular  harmonic  component 
filter  and  the  circular  harmonic  component  phase  am¬ 
plitude  composite  filter.  The  selectivity  and  light  effi¬ 
ciency  of  the  composite  filters  were  improved  by  com¬ 
bining  the  advantages  of  the  phase-only  filters  with 
those  of  the  complex  amplitude  filters.  The  superior 
performance  of  these  filters  was  demonstrated  by  com¬ 
puter  simulations  and  laboratory  experiments.  We 
worked  with  the  zero-order  harmonic  because  the  let¬ 
ter  P  had  a  very  large  fraction  of  its  energy  in  this 
harmonic.  For  the  detection  of  F,  for  example,  a  high¬ 
er  harmonic  is  better.  In  any  case,  a  set  of  filters  for  a 
specific  job  may  include  many  harmonic  orders.  How¬ 
ever,  to  preserve  rotation  invariance,  each  filter  should 
contain  information  using  the  same  harmonic  compo¬ 
nent  of  all  the  input  patterns.  The  experiments  de¬ 
scribed  in  this  paper  are  only  a  sample  of  those  actually 
performed  and  they  represent  the  most  problematic 
cases. 

The  initial  goal  of  the  present  research  project  of 
employing  low  resolution  devices  was  preserved  and 
demonstrated  by  using  a  simple  dot  printer  for  the 
generation  of  the  filters  and  regular  photographic  film 


in  the  actual  experiments.  It  is  also  worthwhile  noting 
that  this  entire  paper  represents  just  a  new  example  of 
the  general  procedure  outlined  in  Ref.  1. 

This  work  was  partially  supported  by  contract 
N00014-86-K-0591  with  the  Office  of  Naval  Research. 
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Abstract 

The  number  of  interconnections  in  a  (fully  connected)  backward 
error  propagation  neural  network  grows  quadratically  with  the  num¬ 
ber  of  neurons  in  the  network.  The  memory  (and  time)  requirements 
for  handling  a  large  number  of  interconnections  can  therefore  become 
a  serious  impediment  for  simulations  and  implementations  of  neural 
networks.  Another  problem  is  that  the  media  used  by  most  neural  net¬ 
work  implementations  (neural  computers)  have  only  a  limited  ability 
to  discriminate  intensity  levels.  In  order  to  represent  neural  networks 
efficiently  in  optical  implementations  (optical  computers)  and  analog 
electronic  implementations,  the  set  of  possible  values  an  interconnec¬ 
tion  strength  (weight)  can  have,  should  be  small.  To  abate  these  prob¬ 
lems,  the  possibility  of  discretizing  the  weights  of  neural  networks  is 
investigated.  Weight  discretization  will  impair  the  performance  of  the 
neural  network.  This  can  be  compensated  by  increasing  the  number 
of  neurons  and/or  the  number  of  hidden  layers.  A  new  discretization 
method  is  developed  and  its  performance  is  compared  to  others. 


1  Introduction 

1.1  Background  and  definitions 

Perceptron  like  neural  networks  can  be  trained  by  teaching  them  pat¬ 
terns.  A  pattern  consists  of  a  set  of  elements  (pixels).  Each  pixel  can  assume 
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a  continuous  or  a  discrete  value.  In  the  discrete  case1,  the  possible  set  of 
pixel  values  is  often  limited.  Typical  pixel  value  sets,  used  in  artificial  neu¬ 
ral  networks,  are:  {0,1}  and  {-1,1}.  Usually,  a  pattern  is  presented  to  the 
neural  network  by  feeding  each  of  the  pixels  to  a  different  input  neuron,  i.e. 
a  neuron  of  the  first  layer  of  the  neural  network.  Therefore  the  number  of 
input  neurons  is  equal  to  the  number  of  pixels  in  the  pattern. 

Most  artificial  neural  network  simulations  consist  of  two  phases  :  a  train¬ 
ing  or  learning  phase,  and  a  recall  or  use  phase.  During  the  training  phase 
patterns  are  presented  to  the  network.  The  interconnection  strengths  (also 
called  synaptic  strengths  or  weights)  of  the  neural  network  are  adapted 
conform  these  patterns  by  means  of  a  neural  network  learning  rule  (as  for 
example  the  backward  error  propagation  learning  rule).  If  the  weights  are 
stabilized,  the  network  is  called  fully  trained.  During  the  recall  phase  input 
patterns  are  presented  to  the  neural  network.  Based  on  the  fixed  weights, 
corresponding  outputs,  which  are  the  activation  values  of  the  neurons  of  the 
highest  layer,  are  generated  by  the  network.  This  form  of  neural  network 
training  is  called  off-line  training.  Off-line  training  is  often  crucial,  since  it 
separates  the  normally  time  consuming  training  from  the  recall  process  and 
therefore  speeds  up  the  use  of  the  neural  network  tremendously. 

In  ‘neural  network  learning  rules  with  a  teacher’,  two  patterns  are  pre¬ 
sented  to  the  network:  an  input  pattern  and  a  target  pattern  (the  ‘teacher’) 
which  is  the  desired  output  for  the  neural  network.  In  these  networks  the  to¬ 
tal  output  of  the  network  has  to  converge  towards  the  target  pattern;  i.e.  the 
activation  values  of  the  output  neurons  have  to  converge  towards  the  pixel 
values  of  the  target  pattern.  In  auto- associative  learning  the  input  patterns 
are  the  same  as  the  target  patterns.  Auto-associative  learning  is  therefore 
used  to  train  the  neural  network  to  remember  a  set  of  patterns.  One  of 
the  main  applications  of  auto- associative  learning  is  image  reconstruction 
or  recalling  a  pattern  if  only  a  partial  or  disabled  input  is  available.  For 
example:  a  knowledge  base  which  can  handle  fuzzy  data. 

In  hetero-associative  learning  the  input  and  target  patterns  are  usually 
different.  Hetero-associative  learning  is  therefore  used  to  train  the  network 
to  associate  each  of  the  input  patterns  with  its  corresponding  target  pattern. 
For  example:  association  of  geological  information  of  a  certain  geographical 
area  with  the  presence  of  fossil  fuels  there. 

'Be  aware  of  the  difference  between  discrete  weights  and  discrete  pixel  values. 
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1.2  Problem  definition 


Since  the  number  of  interconnections  in  a  fully  interconnected  back- 
propagation  neural  network  grows  quadratically  with  the  number  of  neurons 
in  the  network,  the  storage  (and  time)  needed  for  handling  them  is  often 
a  problem  for  neural  network  simulations  and  implementations.  Discrete 
values  (from  a  limited  set)  use  less  storage  and  can  be  handled  more  eco¬ 
nomically  than  continuous  values.  Therefore,  discretization  of  the  weights 
is  investigated. 

Discretization  is  essential  for  all  kinds  of  implementations  of  neural  net¬ 
works,  since  most  information  media  used  can  only  discriminate  a  small  set 
of  intensity  levels.  For  example  optical  implementations  (optical  computing) 
[Caulfield-88]  and  analog  electronic  implementations  [Thakoor-86]. 

The  research  goals  for  the  work  presented  in  this  paper  are  to  develop 
discretization  methods  for  back- propagation  neural  networks,  to  create  a 
software  environment  for  the  simulation  of  neural  network  weight  discretiza¬ 
tion,  to  test  the  discretization  methods  by  computer  simulation,  and  to  find 
rules  of  thumb  for  expanding  the  neural  network  in  order  to  compensate 
for  the  loss  of  information  capacity  due  to  the  discretization  of  the  weights. 
Before  going  to  the  discretization  methods,  some  related  work  is  discussed. 


1.3  Prior  work 
1.3.1  Hopfield  model 

In  his  famous  1982  paper,  Hopfield  [Hopfield-82]  studied  a  ‘clipped’ 
weight  matrix  (T,j).  He  replaced  Tij  by  ±1,  the  algebraic  sign  of  T,;.  The 
purposes  were  to  examine  the  necessity  of  a  linear  synapse  supposition  (by 
making  a  highly  nonlinear  one)  and  to  examine  the  efficiency  of  storage.  He 
found  little  performance2  degradation.  The  number  of  recallable  patterns 
was  (analytically)  jj-  of  the  number  with  linear  T,;’s.  Thus  severe  discretiza¬ 
tion  causes  only  mild  degradation.  To  restore  performance,  the  number  of 
neurons  would  have  to  be  increased  by  f . 

Mn  this  paper,  the  ‘performance’  of  a  neural  network  is  its  ability  to  learn  (and  recall) 
a  certain  amount  of  information. 
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1.3.2  Winner-take-all-models 

Stirk  et  al.  [Stirk-87]  addressed  a  variety  of  non-Hopfield  models  from 
the  viewpoint  of  performance  sensitivity  to  analog  optical  inaccuracies.  The 
results  for  simple  winner-take-all  networks  are  bad.  Furthermore  “big  JV” 
cases  (JV  =  64)  are  significantly  worse  than  “small  JV”  (JV  =  16)  cases. 
Optics  seems  advantageous  over  electronics  only  for  very  large  JV,  say,  10* 
to  106.  This  means  optics  is  accurate  enough  only  for  small  JV,  but  small  JV 
is  probably  better  done  electronically. 


1.3.3  Farhat’s  adaptive  method 

In  a  paper  showing  how  to  implement  large  neural  networks  (103  -  104 
neurons),  Farhat  et  al.  [Farhat-86]  reformulate  the  Hopfield  model  for  two- 
dimensional  inputs  and  outputs  and  four-dimensional  interconnects.  They 
clip  the  interconnection  matrix  in  various  ways  {0,1}, {-1,1}, {-1,0,1}, 
etc.  and  find  that  with  “adaptive  thresholds”  the  {0,1}  interconnection  pat¬ 
tern  (easy  to  implement  optically  and  electronically)  can  achieve  the  same 
performance  level  as  a  multivalued  interconnection  pattern.  In  effect,  they 
have  restored  the  ^  loss  observed  by  Hopfield  by  using  adaptive  thresholds. 


1.3.4  Summary  of  prior  work 

What  is  known  from  these  prior  studies  is  that  some  neural  network 
designs  are  far  more  noise  prone  than  others  and  that  compensatory  methods 
such  as  adding  more  neurons  or  allowing  adaptive  thresholds  can  restore  the 
performance  of  the  network. 


2  Discretization  of  back~propagation  networks 

2.1  Approach 

Among  the  multi-layer  neural  network  learning  rules  capable  of  both 
auto-associative,  and  hetero-associative  learning,  the  backward  error  prop¬ 
agation  learning  rule,  also  known  as  back-propagation  or  error  propagation 
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[Werbos-74]  [Rumelhart-86],  is  the  most  widely  used  and  is  simple  to  use 
[Hecht-Nielsen-88].  The  back-propagation  learning  rule  was  therefore  cho¬ 
sen  for  the  experiments.  The  experiments  are  based  on  the  discretization 
methods  which  are  described  in  paragraph  2.2. 

The  number  of  neurons  per  layer  can  vary;  Ni  indicates  the  number  of 
neurons  in  layer  /  (1  <  I  <  L),  where  L  is  the  total  number  of  layers  (or  slabs) 
in  the  network  including  the  input  and  the  output  layer.  The  interconnection 
weights  between  two  layers  of  a  neural  network  can  be  represented  by  a 
matrix  here  /  represents  the  level  of  the  matrix  (1  <  l  <  L).  The 

level  1  is  the  ordinal  number  of  the  lower  one  of  the  two  layers  connected  by 
Wijj.  The  indices  i  and  j  determine  the  ordinal  number  of  the  neuron  in  the 
lower  and  upper  layer  respectively.  The  weights  as  used  in  ordinary 

back-propagation  models,  can  theoretically  assume  any  (continuous)  value: 

-oo  <  Wf#  <  oo. 

The  used  discretization  methods  produce  discrete  weights  In 

general  there  are  D  discretization  levels,  where  D  is  a  finite  integral  number. 
Since  the  set  of  desired  discretization  values  can  be  mapped  on  any  sequence 
of  numbers  using  a  bijection,  any  set  with  the  same  cardinal  will  satisfy.  In 
this  paper  the  choice  is  made  for  a  sequence  of  consecutive  integers  equally 
divided  among  positive  and  negative  numbers  : 

(n-  |  n=l,2,...,c). 

Discretization  of  the  weights  will  impair  the  performance  of  the  neural 
network,  because  there  is  a  loss  of  information  capacity.  This  is  compensated 
by  increasing  the  number  of  neurons  and/or  the  number  of  hidden  layers  of 
the  network.  In  other  words:  discretized  weights  contain  less  information 
than  continuous  ones;  this  is  compensated  by  using  more  of  them.  The  used 
discretization  methods  are  discussed  in  the  next  paragraph. 


2.2  Three  discretization  method 

2.2.1  The  multiple-thresholding  n*»  od 

The  multiple- thresholding  method  is  the  simplest  of  the  three  discretiza¬ 
tion  methods  used.  It  starts  by  fully  training  the  neural  network,  using  the 
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baclf- propagation  learning  rule;  i.e.  iterate  (over  steps  2  till  4  of  the  al¬ 
gorithm  in  appendix  A)  until  convergence3  is  reached.  Then  discretize  the 
continuous  weights  into  discrete  valued  weights  using  a  nonlinear  function 
(usually  a  multiple-threshold).  The  weight  matrices  so  obtained  are  re¬ 
ferred  to  as  the  discrete  network.  The  original  set  of  weight  matrices  with 
continuous  weights  is  called  the  continuous  network.  Chiueh  and  Goodman 
[Chiueh-88]  have  applied  this  method  using  three  discretization  levels.  They 
found  that  about  15%-50%  of  the  networks  did  not  work. 


2.2.2  The  direct  discretization  method 

In  the  direct  discretization  method,  the  neural  network  is  initialized  with 
discrete  weights,  which  have  random  values  within  the  discrete  range.  The 
forward  propagation  is  similar  to  the  normal  back-propagation  learning  rule 
(step  2  of  the  algorithm  in  appendix  A).  During  the  backward  propagation 
(step  3  of  the  algorithm  in  appendix  A),  the  weights  are  updated  only  if  the 
difference  in  weight  (A Wf{-)  is  big  enough  to  change  the  weight  into  one 
of  the  other  possible  discrete  values.  This  method  does  not  work  for  the 
standard  back-propagation  learning  rule  (see  appendix  B  for  a  proof). 


2.2.3  The  continuous-discrete  learning  method  (CDLM) 

This  new  developed  method,  schematically  shown  in  figure  1,  starts  off 
with  the  multiple-thresholding  method  (paragraph  2.2.1,  and  (a)  to  (f)  in 
figure  1).  Next  the  original  input  pattern  (a)  is  fed  (h)  into  the  discrete 
network  (g).  The  outputs  obtained  by  forward  propagation  (step  2b  of 
appendix  A)  are  compared  (i)  with  the  target  pattern  (e)  and  the  errors 
(5’s)  are  back-propagated  (j)  through  the  continuous  network  (c).  Next, 
the  weights  of  the  continuous  network  are  discretized  (f)  as  before  and  the 
process  starts  all  over  again  until  the  system  reaches  convergence.  The  fully 
trained  discrete  network  (g)  can  now  be  used  for  the  recall  phase. 

This  approach  leads  to  an  increase  in  the  total  number  of  iterations 

3  From  now  on  ‘reaching  convergence"  will  stand  for  reaching  of  the  convergence  _ri- 
terion  (see  paragraphs  1.1  and  3.1)  or  another  limiting  (normally  time-based)  facte.,  e.g. 
the  maximum  number  of  iterations  allowed.  “Convergence"  itself  will  stand  for  converging 
into  the  desired  range  (c-range),  as  opposed  to  convergence  to  any  value. 
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needed.  The  process  can  be  speeded  up  by  skipping  the  full  training  of  the 
continuous  network,  since  starting  with  a  fully  trained  continuous  network 
is  convenient,  but  not  necessary. 


3  Evaluation  of  the  discretization  methods 

3.1  The  back-propagation  model  used 

The  experiments  performed  are  based  on  the  back-propagation  learn¬ 
ing  rule.  The  characteristics  of  the  back-propagation  model  used  are:  it 
is  fully  connected  between  adjacent  layers,  it  has  no  intralayer  connections 
i.e.  connections  between  neurons  in  the  same  layer,  and  no  supralayer  con¬ 
nections  i.e.  connections  between  neurons  that  are  not  in  adjacent  layers 
[Rumelhart-86,  figure  8.3].  The  following  assumptions  are  made:  a  nega¬ 
tive  weight  is  inhibitory,  a  zero  weight  means  no  connection  and  a  positive 
weight  is  excitatory.  Thus,  in  spite  of  the  fully  connectedness  (between  ad¬ 
jacent  layers),  the  situation  of  two  neurons  without  an  interconnection  can 
be  represented  theoretically  in  this  way. 

The  patterns  used  to  train  the  network  were  free  of  noise.  They  are 
presented  to  the  neural  network  as  a  set  of  pairs  of  patterns.  Each  pair 
consists  of  an  input  pattern  and  its  corresponding  target  output  pattern. 
A  pattern  consists  of  a  rectangular  matrix  of  pixel  values  (height  x  width) 
which  is  mapped  onto  the  one  dimensional  set  of  input  neurons  (the  neurons 
in  layer  one).  Let  h*  be  the  height  of  the  input  patterns  and  w'  the  width  (i 
stands  for  input).  The  pixel  value  of  input  pattern  j  ( Pj,mn )  is  mapped  on 
input  neuron  (m  -  l)tn'  +  n,  where  m  indicates  the  row  of  the  pixel  in  the 
pattern  (1  <  m  <  h')  and  n  the  column  (1  <  n  <  w').  The  patterns  in  the 
set,  which  are  presented  in  the  order  they  are  provided  by  the  user,  are  fed 
repeatedly  into  the  input  neurons  of  the  neural  network  until  convergence  is 
reached.  The  convergence  criterion  used  is:  when  all  the  activation  values 
of  the  output  neurons  reach  their  c- range.  An  c-range  is  the  range  near  a 
desired  output,  determined  by  the  deviation  parameter  (e).  The  deviation 
parameter  is  the  maximum  amount  that  an  output  activation  value  may 
deviate  from  the  target  pattern  value. 
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3.2  Implementation 
3.2.1  Software  specification 

In  order  to  perform  the  discretization  experiments  with  ail  the  necessary 
flexibility,  a  portable  (machine  independent)  back- propagation  software  en¬ 
vironment  was  developed  by  the  author  using  the  PASCAL  programming 
language  [Jensen-78].  The  main  part  of  the  software  has  been  developed  on 
a  personal  computer.  When  some  of  the  experiments  took  more  than  24 
hours  to  run  on  the  personal  computer,  changing  over  to  a  Cray  X-MP/24 
supercomputer  seemed  a  good  idea. 

Some  of  the  flexibility  criteria  for  the  software  environment  were  :  the 
capability  of  handling  both  auto- associative  and  hetero- associative  learning, 
changing  the  pattern  size  (height  and  width),  the  number  of  patterns,  the 
learning  rate  (77),  the  number  of  layers,  the  number  of  neurons  per  layer  (for 
each  hidden  layer),  the  number  of  discretization  levels,  the  deviation  param¬ 
eter,  and  the  maximum  number  of  iterations  for  the  (continuous-discrete) 
learning  method,  also  the  ability  of  choosing  a  discretization  method,  an 
initialization  scheme  for  the  weights,  and  a  pixel  value  set. 

The  most  important  outputs  of  the  simulation  system  (for  both  the  con¬ 
tinuous  and  the  discrete  network)  are: 

•  the  stop  criterion  :  whether  the  desired  output  is  reached  (within  the 
c-range)  or  the  number  of  iterations  reached  its  maximum 

•  the  output  values  (activation  values  of  the  output  neurons)  after  each 
iteration,  if  desired 

•  the  number  of  iterations  made 

•  the  number  of  errors  made  (output  activation  values  that  reached  un¬ 
desired  values) 

•  the  number  of  output  neurons  that  did  not  reach  the  desired  output 
(within  the  t-range) 

•  the  maximum  deviation  (between  actual  output  activation  value  and 
the  desired  output  value) 

The  user  can  choose  which  outputs  are  desired  for  specific  experiments. 
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3.2v2  Methodology 


The  most  promising  discretization  method  is  the  CDLM,  because  the  di¬ 
rect  discretization  method  does  not  work,  and  the  CDLM  easily  outperforms 
the  multiple- thresholding  method  because  the  first  includes  and  improves 
the  second  method. 

Two  approaches  were  taken  to  test  both  the  CDLM  and  the  multi¬ 
threshold  method.  First  a  systematical  ‘search’  through  the  state  space  of 
possible  experiments.  The  starting  position  was  the  smallest  network  possi¬ 
ble  and  using  two  discretization  levels,  since  this  is  the  preferred  number  for 
most  neural  network  implementations.  The  next  variable  to  vary  is  there¬ 
fore  the  pattern  size  which  is  the  same  as  the  input  size.  Then  both  auto- 
and  hetero-associative  learning  were  tested.  The  number  of  patterns  was 
the  next  variable  to  vary.  This  meant  starting  off  with  a  two  layer  system, 
which  would  be  enlarged  in  further  experiments.  The  number  of  possible 
experiments  was  growing  exponentially,  a  second  approach  was  taken. 

Here,  the  collection  of  patterns  was  fixed.  This  means  a  fixed  number  of 
patterns,  a  fixed  pattern  size,  a  fixed  number  of  input  and  output  neurons 
and,  in  this  case,  a  choice  was  made  for  hetero-associative  learning.  The 
central  parameter  in  this  approach  is  the  number  of  discretization  levels. 


3.2.3  Parameter  definition 

This  paragraph  discusses  the  parameters  which  were  kept  constant  in 
most  of  the  experiments.  For  perfect  recall  (i.e.  output  activation  values 
are  within  their  c-range),  using  noise  free  inputs,  it  turned  out  that  the 
higher  the  learning  rate  the  faster  the  convergence.  Besides  dedicated  ex¬ 
periments,  the  value  of  the  learning  rate  (77)  was  kept  at  0.5  [Caudill-88]. 
The  value  used  for  the  deviation  parameter  (c)  is  0.05.  Random  values  in 
the  range  [—0.1, 0.1]  were  used  to  initialize  the  weights.  The  pixel  value  set 
used  is  {-1,1}.  For  the  nonlinear  function  required  in  both  the  multiple¬ 
thresholding  method  and  the  CDLM,  a  multiple  threshold  with  rounding 
off  to  the  nearest  pixel  value  was  used.  A  typical  figure  for  the  maxi¬ 
mum  number  of  iterations  is  20,000.  The  local  thresholds  (0’s)  or  biases 
[Rumelhart-86,  p331-]  were  kept  zero. 

In  the  second  approach  a  number  of  variables  were  fixed  in  order  to  limit 
the  state  space.  These  experiments  used  hetero-associative  learning,  three 
by  five  pixel  patterns,  a  learning  rate  of  0.5,  and  two  to  four  layers. 
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3.3  Results 


“Being  able  to  learn  and  perfectly  recall  (associate)  a  set  of  noise  free 
patterns”  is  taken  as  a  measure  for  comparing  the  performances  of  contin¬ 
uous  and  discrete  networks. 

What  could  be  expected  intuitively,  is  confirmed  by  the  experiments:  the 
performance  of  the  CDLM  is  better  than  the  performance  of  the  multiple- 
threshold  method.  The  outputs  of  the  multiple-thresholding  method  were 
often  outside  the  e-range.  Sometimes,  wrong  results  were  obtained  when 
rounding4  was  applied  to  the  outputs.  In  general  rounding  can  be  used  to 
obtain  a  {0,  l}-result  from  an  output  neuron  that  did  not  converge  into  the 
e-range. 

The  CDLM  on  the  other  hand  usually  achieves  much  better  results.  The 
first  approach  (see  paragraph  3.2.2)  emphasized  the  performance  restoration 
of  the  neural  network  using  the  minimum  number  of  discretization  levels, 
which  is  two.  In  the  case  of  associating  one  set  of  two  patterns  by  the 
simplest  network  of  one  input  neuron,  one  output  neuron  and  a  variable 
number  of  hidden  layers  and  neurons  in  them,  a  two  level  discretization 
works  very  well  (see  figure  2).  The  two  layer  network  does  not  converge  to 
a  value  within  the  e-range  but  gives  the  right  answer  after  rounding.  The 
performance  of  a  three  layer  system  with  one  neuron  in  the  hidden  layer 
is  worse  than  the  two-layered  network.  But  adding  neurons  to  the  hidden 
layer  increases  the  performance.  With  five  neurons  in  the  hidden  layer  the 
e-range  of  0.05  is  reached.  In  figure  3  the  situation  for  four  layers  is  depicted. 
In  order  to  reach  e-accuracy,  the  minimum  number  of  neurons  needed  in  the 
hidden  layers  is  (5&3),  (2&4),  and  (1&5)  neurons  in  the  second  &  third  layer. 
If  two  patterns  are  stored,  the  graph  (see  figure  4)  is  less  smooth  but  the  same 
behavior  can  be  observed.  Fourteen  hidden  neurons  are  needed  in  the  second 
layer  to  reach  e-accuracy.  Note  that  for  the  three  layered  network  with  one 
or  two  neurons  in  the  hidden  layer  faulty  results  are  produced  when  rounding 
is  used.  In  order  to  reach  the  right  outputs  after  applying  rounding  in  the 
case  of  four  layers,  the  minimal  number  of  units  needed  in  the  hidden  layers 
is  (2&9),  (3&6),  (4&5),  (5&4),  (6&4),  (7&4),  (8&3),  and  (9&4)  neurons  in 
the  second  &  third  layer.  However,  rounding  gives  sometimes  wrong  results 
for  some  hidden  layer  sizes  larger  than  these  minima.  The  number  of  extra 
neurons  needed  to  restore  the  performance  is  relatively  high.  Other  results 
showed  that  this  relative  overhead  became  smaller  for  bigger  networks. 

4  From  now  on,  ‘‘rounding”  denotes  rounding  off  to  the  nearest  pixel  value. 
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Ip  some  of  the  smaller  networks  the  activation  values  of  the  output  neu¬ 
rons  remained  constant  during  the  discrete  training.  In  these  cases  the  per¬ 
formances  of  the  multiple-thresholding  method  and  the  CDLM  are  equal. 

The  emphasize  of  the  second  approach  is  on  comparing  results  using 
different  numbers  of  discretization  levels.  The  pattern  set  consists  of  nine 
pairs  of  character-like  patterns.  The  continuous  network  could  perfectly 
associate  them  all  after  372  iterations. 
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This  table  shows  that  if  the  number  of  discretization  levels  increases,  the 
number  of  non-converging  outputs  decreased.  A  perfect  recall  was  obtained 
at  seven  discretization  levels.  Further  increase  leads  to  a  decrease  in  the 
number  of  iterations  needed  for  a  perfect  recall.  This  observation  can  also 
be  made  for  the  continuous  network:  adding  neurons  to  the  network  leads 
to  a  faster  convergence  for  the  continuous  network  (less  iterations  needed). 

Sometimes  the  performance  of  the  network  reached  a  maximum,  without 
reaching  total  convergence.  In  order  to  compensate  for  this,  the  observed 
maximum  performance  is  stored  and  used  as  final  result. 

If  the  CDLM  starts  with  a  full  training  of  the  continuous  network,  the 
number  of  iterations  needed  for  training  the  discrete  network  varies  from  one 
to  a  number  of  iterations  comparable  to  the  number  of  iterations  needed  for 
the  training  of  the  continuous  network.  In  this  case,  the  total  number  of 
iterations  needed  for  the  CDLM  is  therefore  one  to  two  times  that  of  the 
continuous  network. 

In  general:  addition  of  a  new  layer  to  the  network,  without  increas¬ 
ing  the  total  number  of  neurons  in  the  network,  results  in  a  performance 
degradation.  This  can  be  explained  by  looking  at  the  total  number  of  inter¬ 
connections  before  and  after  the  addition  of  a  new  layer.  If  a  fully  connected 
two-layered  network  contains  Ni  neurons  in  its  input  layer,  and  Ni  in  its 
output-layer,  the  total  number  of  weights  is  N\  ■  Nl.  The  number  of  neu¬ 
rons  needed  in  an  additional  (=hidden)  layer  to  have  the  same  number  of 
connections  is  ■ 


#  discr.  levels 

#  iterations 

#  non-converg. 

max.  abberation 

2 

10000 

67 

0.88 

3 

10000 

18 

0.50 

5 

10000 

2 

0.12 

7 

69 

0 

0.00 

9 

47 

0 

0.00 
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4  Conclusions  and  future  work 

Of  the  three  discretization  methods  proposed,  the  CDLM  works  better 
than  the  multiple- thresholding  method,  and  the  direct  discretization  method 
is  unusable.  A  portable  neural  network  software  environment  has  been  cre¬ 
ated  for  performing  the  discretization  experiments.  As  intuitively  expected, 
the  lower  the  discretization  (more  discretization  levels),  the  better  the  per¬ 
formance  of  the  neural  network.  But,  using  two  discretization  levels,  as 
desired  by  optical  and  electronic  implementations,  give  reasonable  results. 
The  results  of  a  two  layer  neural  network  are  usually  good  enough  when 
using  the  CDLM. 

If  the  CDLM  starts  with  a  fully  trained  continuous  network,  the  number 
of  iterations  will  be  one  tt  times  that  of  the  number  needed  for  the  full 
training  of  the  continuous  network. 

Future  work  will  consist  of  the  search  for  other  discretization  methods. 
Also  doing  more  experiments,  which  might  bring  better  rules  of  thumb  for 
restoring  the  performance  of  the  neural  network.  Furthermore  analytical 
analyzing  of  the  discretization  methods  will  be  explored.  Another  goal  is 
speeding  up  the  simulation  software  by  optimization  and  vectorization  of 
the  program  code.  Finally,  since  most  of  the  data  is  multidimensional,  data 
visualization  techniques  are  being  designed  for  representing  the  data. 
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APPENDIX  A 


Back  (ward  Error)  Propagation;  the  formulas 


In  this  appendix,  ajt,  represents  the  activation  value  of  the  neuron  i  in  layer 
/  of  the  neural  network,  and  tj  is  the  target  pattern  value  which  corresponds 
to  neuron  j  in  the  output  layer. 

Back'propagation  consists  of  the  following  steps  : 

(1)  Initialize  the  weights  (  Wfe  )  and  offsets  (  0(j  ). 

(2a)  ai'i  input  to  the  i-th  neuron  of  the  input  layer. 

(2b)  Forward  propagation  : 


aiJ  •= 


1  +  e 


for  2  <  l  <  L 


(3)  Backward  propagation  : 


\ji  :=  1  ^/+ij  al,i 

where 


(*i 


~  a».j  )«/,>(  1  -  °/.i ) 


==  \ 

{  a,+w(  1  -  a,+lJXE £V  W^W,) 

next,  add  to  Wf  - 

(4)  IF  no(t  enough)  convergence  THEN  GOTO  (2a) 


if  1  =  1-1 
if  1  <  /  <  L  -  1 
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APPENDIX  B 


Proof  of  ‘direct  discretization  method’  limitations 

The  weight  update  formula  for  the  highest  level  interconnection  matrix  (see 
step  3  of  appendix  A)  is  equivalent  to: 

=  l(*i  “  aL- ij)a£,-itj(l  -  a£,-ij)aL,i'l 

since  0  <  aiti  <  1  and  0  <  17  <  1: 

<  l(«i  -  ai-ij)ai- ij(l  - 

Partial  differentiation  of  this  function  shows  that  it  has  no  local  extremum 
in  the  open  interval  (0,1)  x  (0, 1)  of  the  plane  spanned  by  a*,_ ij  x  tj.  The 
maximum  of  this  function  will  therefore  be  a  boundary  extremum  of  this 
interval.  Since  the  function  equals  zero  for  both  ai- \j  =  0  and  ai- =  1, 
two  cases  are  left  to  be  examined: 


case  1:  tj  =  0 

\&wLiji\  <  |-aL-ij(1  -  «£-w)|  =  -  aL-u) 


daL-\.l 


=  -  3af,_ij) 


_  n 

3SIZT7  “ u 


a£_  ij  =  0  minimum 
ai-ij  =  3  maximum 


maximum  for  ai-\j  =  5  => 
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This  is  the  maximum  weight  increase  possible.  So,  when  using  less  than 
[2Z]  =  4  discretization  levels,  the  weights  will  never  be  updated.  Since 
the  average  weight  update  is  much  smaller  than  this  maximum,  the  direct 
discretization  method  is  not  usable  for  standard  back-propagation  neural 
networks. 
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figure  1  The  continuous-discrete  learning  method 


figure  3 
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Abstract 

One  of  the  most  fundamental  properties  of  a  neural  network  is  its 
(storage)  capacity.  It  determines  the  practical  usefulness  as  well  as  the 
(storage)  limitations  of  the  neural  network.  So  far,  the  capacity  of  neural 
networks  has  mainly  been  studied  for  specific  learning  rules  only. 

This  paper  presents  some  theoretical  upperbounds  on  the  (storage) 
capacity  of  neural  networks.  These  generally  applicable  upperbounds  are 
topology  independent  and  learning  rule  independent.  The  problem  of  ca¬ 
pacity  is  approached  from  different  points  of  view.  An  overall  upperbound 
based  on  combinatorics,  and  a  tighter  upperbound  from  information  theo¬ 
retical  point  of  view,  are  given.  Also  included  is  an  upperbound  for  linear 
neural  networks  (or  discriminants). 

For  general  reference  an  extensive  bibliography  on  the  subject  of  neural 
network  capacity  is  appended. 
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WTBOPVCTTOE 

Much  discussion  is  taking  place  about  the  usefulness  of  (artificial)  neural 
networks.  The  viability  of  their  use  depends  to  some  extent  on  their  limitations. 
One  of  the  most  fundamental  limitations  is  the  (storage)  capacity  of  neural  net¬ 
works.  One  wants  to  be  able  to  store  and  process  as  much  information  in  a 
neural  network  as  possible.  The  capacity  issue  has  many  impacts  on  fundamen¬ 
tal  and  applied  research  on  neural  networks;  cf.  (DARPA-88).  It  is  essential  for 
the  work  on  connectivity  and  optimal  topologies  of  neural  networks  (Fiesler-90). 

Since  the  information  presented  to  neural  networks  can  be  represented  as 
patterns,  it  is  useful  to  examine  the  pattern  capacity  of  a  neural  network.  The 
pattern-capacity  ( C )  of  a  neural  network  is  the  number  of  different  patterns  that 


can  be  stored  in  that  neural  network,  where  a  pattern  consists  of  a  set  of  values 
called  pixels,  and  a  pixel  can  assume  any  value  from  a  (finite  or  infinite)  set, 
called  pixel  value  set.  A  pattern  is  said  to  be  stored  in  a  neural  network  if  it  can 
be  retrieved  (within  certain  exactness  limits)  as  an  output  from  the  network  by 
feeding  the  corresponding  input  into  the  network.  In  this  paper,  exact  recall  of 
the  patterns  is  assumed,  this  means  that  no  errors  are  allowed  in  the  (possible) 
recall  of  a  pattern. 

Neural  networks  are  characterized  by  their  architecture  (including  their  to¬ 
pology)  and  their  learning  rules.  Up  till  now,  neural  network  capacity  research 
has  been  oriented  towards  neural  networks  with  a  specific  learning  rule  (cf. 
the  appended  biography  on  capacity).  In  order  to  get  an  approximation  of 
the  capacity  of  an  arbitrary  neural  network,  the  concepts  which  are  general 
to  neural  networks  have  to  be  explored.  To  remain  independent  of  topology 
and  learning  rule,  one  has  to  restrict  considerations  to  static  entities  like  the 
number  of  neurons  in  the  network  {N),  the  number  of  weights  ( W),  and  in  case 
of  discrete  neural  networks:  the  number  of  discretization  levels  for  the  weights 
(£>),  and  the  pixel  value  set  cardinality  (d),  which  is  the  number  of  different 
(input)  values  possible  for  a  pattern  pixel.  A  discrete  neural  network  is  defined 
as  a  neural  network  with  discrete  interconnection  strengths  (weights).  This  work 
is  based  on  discrete  neural  networks,  since  all  (computer)  implementations  of 
theoretically  continuous  neural  networks  have  a  finite  precision  they  can  be  seen 
as  discrete  ones.  For  example  in  a  computer  simulation  of  a  continuous  neural 
network  with  b  bit  number  representation,  the  number  of  discretization  levels, 
and  the  pixel  value  set  cardinality,  can  maximally  be  24.  An  introduction  to 
discrete  neural  networks  and  related  definitions  are  described  in  (Fiesler-88). 

AN  UPPER  UPPERBOUND 

Every  weight  in  a  neural  network  can  assume  D  different  values,  and  there 
are  W  weights  in  the  network.  Therefore,  using  plain  combinatorics,  the  number 
of  different  patterns  that  can  be  represented  (this  is  the  number  of  different 
distinguishable  states)  in  a  neural  network  is 

Dw. 

An  input  pattern  is  copied  into  the  activation  values  of  the  input  neurons.  There 
are  N\  input  neurons  and  each  can  assume  one  of  the  d  different  possible  values. 
The  number  of  different  input  patterns  that  cam  be  presented  to  a  layered  neural 
network  is 

dN'. 

In  order  to  store  information  in  a  neural  network,  it  has  to  flow  through  the 
input  neurons.  Hence,  the  smallest  value  of  the  two  equations  given  above, 
will  be  the  information  bottleneck  for  the  network.  The  upperbound  for  the 


pattern-capacity  is  therefore 


Minimum  {dNl ,  Dw ). 

Although,  since  for  multi-layered  neural  networks  N\<W ,  for  most  networks: 
dNl  <DW  and  in  this  case  the  upperbound  will  be  dN' . 

A  TIGHTER  UPPERBOUND 

This  number,  which  is  a  theoretical  capacity  upperbound,  can  be  lowered 
if  information-capacities  are  considered.  Information  theory  defines  the  total 
amount  of  information,  or  entropy,  of  a  system  to  be  p,-  log p~ 1 ,  where  pi 
is  the  probability  of  occurrence  of  state  t  of  the  network,  1  <  i  <  n.  (The  nota¬ 
tion  "log”  stands  for  any  base  logarithm;  a  convenient  choice  is  base  D.)  The 
maximum  amount  of  information  is  obtained  when  all  p,' s  are  equal  (Hamming- 
80);  i.e.  pi  =  n-1,  and  the  total  information  is  logn.  An  upperbound  for  the 
information-capacity  of  a  neural  network  is  therefore 

log  Dw  =  W\ogD. 

Analogous,  the  information-capacity  of  an  input  pattern  is 

log  dN'  =  /Vi  log  d. 

If  the  number  of  patterns  to  be  stored  is  C,  the  total  input  to  the  network  is 
then 

CNi  log  d. 

If  we  let  C  be  the  pattern-capacity  of  the  neural  network,  this  quantity  has 
to  be  equal  to  the  total  information  in  the  neural  network.  Therefore,  if  we 
combine  both  formulas,  we  get  an  upperbound  for  the  pattern-capacity  of  a 
neural  network  of 

W\o%D 

N\\o%d 

This  results,  applied  to  fully  interlayer  connected  neural  networks  gives,  for  a 
two  layer  network  (L  =  2  and  W  =  NiN?): 

iV?  log  £> 

-  logd 

and  for  an  auto-associative  neural  network  (N\  =  ATt)  which  has  three  layers 
(1=3  and  W  =  N,(Nl+N3)): 

WogD 
"  S  log  d  ’ 

where  the  pattern-capacity  is  directly  proportional  to  the  number  of  neurons  in 
the  hidden  layer,  since  D  and  d  are  constants. 


UPPERBOUNDS  FOR  LINEAR  NEURAL  NETWORKS 


Linear  problems  are  well  understood  mathematically  (Pao-89).  The  non¬ 
linearity  of  neural  networks  is  what  makes  them  hard.  So  in  order  to  get  a  grip  on 
the  upperbound  of  the  capacity  of  neural  networks,  the  non-linearity  is  stripped 
for  a  moment  and  linear  neural  networks  (with  interlayer  connections  only)  are 
observed.  A  two  layer  linear  neural  network  with  only  interlayer  connections 
is  known  as  a  (linear)  discnmtnani.  Observe  a  simple  linear  neural  network, 
where 

N.-i 

ai,j  =  Wiijdt-ii  for  1  <  j  <  Si, 

1=1 

in  which  ai  i  represents  the  activation  value  of  neuron  i  in  layer  l.  For  the 
input  layer  (1=  1),  this  value  is  equal  to  the  pixel  value  for  the  corresponding 
input  neuron  of  the  network.  For  a  two  layer  system  the  system  consists  of 
JV2  equations.  In  these  equations  the  interconnection  weights  are  the  variables. 
In  a  fully  interlayer  connected  neural  network  there  are  W  =  N\N<i  weights, 
and  therefore  N\Ni  degrees  of  freedom  (independent  variables),  which  means 
that  the  system  of  linear  equations  is  completely  determined  by  giving  S\Ni 
variables  a  value. 

When  a  pattern  is  presented  to  the  two  layer  network,  the  activation  values 
are  known  and  this  gives  Sj  equations  in  WjA'j  unknowns.  The  number  of 
degrees  of  freedom  of  a  system  of  E  linear  independent  equations  in  U  unknowns 
is 

Maximum(tf  —  E,  0) 

So  in  the  previous  case,  the  number  of  degrees  of  freedom  is  N1N2  —  N2  — 
Nj(Ni  —  1).  Each  new  pattern,  which  has  at  least  some  component  orthogonal 
to  the  other  patterns,  gives  a  new  set  of  iVj  equations  in  the  same  variables. 
Thus  after  P  patterns,  N\N?  —  PA/j  —  Ni{N\  —  P)  degrees  of  freedom  are 
left  over.  A  system  of  linear  independent  equations  is  solved  when  there  are 
no  degrees  of  freedom  left.  This  happens  when  P  ~  N\.  So  an  upperbound 
for  the  pattern-capacity  of  a  two  layer  neural  network  when  considering  linearly 
independent  patterns  to  be  stored  completely  (error-free)  is:  N\. 

If  we  extend  this  to  more  layers  and  assume  the  activation  values  to  be 
known,  we  have  to  incorporate  the  other  layers  as  well  and  get  as  an  upperbound 
for  the  pattern-capacity  for  a  multi  layer  linear  neural  network: 

L- 1 

C  N,  =  N  -  NL- 
i=i 


SUMMARY 


In  this  papei  a  number  of  analytically  derived  upperbounds  on  the  (stor¬ 
age)  capacity  of  neural  networks  are  presented.  They  are  independent  of  the 
network  topology  and  the  learning  rule  used.  It  is  shown  that  the  maximum 
amount  of  information  that  can  be  stored  in  an  arbitrary  neural  network  is  nor¬ 
mally  limited  by  the  number  of  input  states,  which  is  exponential  in  the  number 
of  input  neurons. 

For  an  exact  recall,  the  capacity  upperbound  can  be  ‘compressed’  to  an 
amount  which  is  proportional  to  the  total  number  of  weights  divided  by 
the  number  of  input  neurons  of  the  network.  For  layered  neural  networks,  with 
up  to  three  layers,  the  upperbound  becomes  linear  in  N2,  the  size  of  the  second 
layer. 

An  upperbound  for  the  number  of  partially  orthogonal  patterns  that  can 
be  stored  in  a  linear  neural  network  is  proportional  to  N,  the  total  number  of 
neurons  in  the  network. 
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Abstract 

One  of  the  main  problems  in  current  (artificial)  neural  network  engineering  is 
the  lack  of  design  rules  for  neural  networks,  i.e.  how  many  layers  and  how  many 
neurons  per  layer  to  choose  for  a  fully  connected  layered  neural  network  with 
bidirectional  weights.  A  theory  is  developed  which  optimizes  the  topology  of  the 
neural  network  to  allow  a  maximum  potential  storage  capacity  with  a  minimum 
amount  of  neurons. 

Keywords:  (artificial)  neural  networks,  connectionism,  neural  network  topology,  neu¬ 
ral  network  statics,  neural  network  connectivity,  neural  network  capacity 

Introduction 

Although  the  field  of  artificial  neural  networks,  hereafter  called  neural  networks,  is 
a  rapidly  growing  one,  some  basic  questions  remain  unanswered.  One  of  the  most  im¬ 
portant  problems  is  how  to  configure  a  neural  network.  Many  neural  network  learning 
rules  apply  to  (fully  connected)  layered  (first  order)  neural  networks  with  bidirectional 
weights  (or  interconnection  strengths).  A  bidirectional  connection  is  a  connection  that 
has  the  same  connection  strength  when  used  for  either  forward  or  backward  propaga¬ 
tion.  (If  a  neural  network  uses  only  unidirectional  propagation,  the  interconnection 
topology  of  the  neural  network  is  identical  to  one  with  unidirectional  connections.) 

For  layered  neural  networks  in  general,  one  needs  to  determine  the  number  of  layers 
and  the  number  of  neurons  per  layer.  Since  neural  networks  are  used  for  processing 
and  storage  of  information,  the  ‘optimal’  topology  for  a  neural  network  is  usually  one 
which  allows  an  optimal  (information)  storage  capacity.  Since  the  interconnection 
strengths  (weights)  contain  the  information  of  the  neural  network,  the  information 
capacity  is  proportional  to  the  total  number  of  weights  in  the  network  [1],  A  fully 
connected  neural  network  will  therefore  have  a  higher  information  capacity  them  any 
other  interconnection  scheme.  However,  in  layered  neural  networks  there  are  several 
types  of  connections. 


Counting  Weights 

In  layered  neural  networks  one  can  discriminate  three  classes  of  connections: 
Definition  :  An  interlayer  connection  is  a  connection  between  neurons  in  adjacent  layers 
of  the  neural  network. 

Definition  :  An  intralayer  connection  is  a  connection  between  neurons  of  the  same  layer 
of  the  neural  network. 

Definition  :  A  supralayer  connection  is  a  connection  between  neurons  that  are  neither 
in  adjacent  layers,  nor  in  the  same  layer  of  the  neural  network. 

A  sub-class  of  intralayer  connections  are  self-connections: 

Definition  :  A  self-connection  is  a  connection  which  originates  and  terminates  at  the 
same  neuron. 

A  neural  network  can  have  all  possible  connections: 

Definition  :  A  plenary  neural  network  is  a  neural  network  which  has  all  possible  in¬ 
terlayer,  intralayer,  and  supralayer  connections;  in  other  words  it  is  a  ‘truly’  fully 
connected  neural  network. 

The  total  number  of  weights  ( W )  for  a  neural  network  with  L  layers,  which  has  only 
interlayer  connections  (i.e.  they  have  neither  intralayer  nor  supralayer  connections)  is 
the  sum  of  all  possible  connections  (between  each  pair  of  adjacent  layers)  in  the  net¬ 
work.  The  number  of  connections  between  two  adjacent  layers  in  a  fully  interconnected 
network  is  equal  to  the  product  of  the  number  of  neurons  in  each  of  the  layers.  In  order 
to  get  the  total  number  of  weights  for  the  complete  network,  a  summation  is  needed 
over  the  layers: 

L  L 

W  =  '52W,  =  J2  N>~i 
1=2  1=2 

where  Wi  stands  for  the  number  of  weights  between  layer  /  —  1  and  /,  and  Ni  is  the 
number  of  neurons  in  layer  l.  Layer  1  is  the  input  layer  and  N i  the  number  of  input 
neurons. 

In  the  case  that  the  neural  network  has  both  interlayer  and  intralayer  connections, 
a  number  equal  to  the  number  of  possible  connections  within  a  layer  has  to  be  added 
for  each  layer.  The  total  number  of  connections  becomes  thus: 

£  */-!*/  +  [t  w  ±  1)  +1  £  f  w  ±  D  = 

1  =  2  1=2  i 

where  the  part  between  square  brackets  is  optional;  it  is  deleted  when  no  intralayer 
connections  are  present  in  the  input  layer.  The  ±-symbol  denotes  the  option  for  having 
self-connections.  If  self-connections  are  present,  addition  has  to  be  used,  and  substrac- 
tion  otherwise. 


The  number  of  weights  in  a  network  which,  besides  interlayer  connections  also  has 
supralayer  connections  can  be  calculated  by  summing  over  ail  the  neurons  in  all  the 
layers,  multiplied  by  the  number  of  neurons  in  all  the  layers  of  a  higher  index: 

L  L  L- 1  m 

'£N,-1'£Nm= 

1=2  m=l  m=  1  /=1 

When  combining  the  previous  two  formulas,  the  total  number  of  neurons  is  obtained 
for  a  plenary  neural  network,  which  has  all  three  types  of  connections: 


W  = 


^•W±l)  + 


£(^(W,±  !)  +  *,_!£>„ 


J  1=2 


m=l 


where  the  part  in  square  brackets  is  again  optional  and  used  when  intralayer  connections 
are  desired  in  the  input  layer.  In  the  case  that  the  intralayer  connections  in  the  input 
layer  are  also  present,  the  formula  becomes  equal  to 


£y(*»±  + 

1=1  1=2  m=t 

Since  a  plenary  network  can  be  represented  as  a  fully  connected  graph,  the  previous 
equation  is  equal  to: 

W  =  j(N±l), 

(this  is  the  number  of  edges  in  a  fully  connected  undirected  graph  with  N  vertices); 
where  N  is  the  total  number  of  neurons  in  the  network:  N  =  Ylt=i  M- 


Optimal  topologies 

Depending  on  the  type(s)  of  interconnections  present,  the  capacity  of  a  neural 
network  can  be  optimized  by  varying  the  topology  of  the  network.  Plenary  neural 
networks  are  a  trivial  case;  there  topology  is  always  optimal,  since  they  can  be  seen 
as  a  fully  connected  (undirected)  graph,  whose  number  of  edges  only  depends  on  the 
number  of  vertices. 

For  layered  neural  networks  with  only  interlayer  connections  (the  most  used  topolo¬ 
gies),  the  configuration  topology  does  make  a  difference.  Let  the  total  number  of 
weights  W  =  ^2t-2  N/-i  Ni,  as  defined  before.  For  a  two  layer  neural  network  L  = 
2,  W=NiN2,  and  N  =  Ni  +  N2.  The  total  number  of  weights,  W  can  be  represented 
as  a  function  of  N:  W(Ar)  =  N\N2-  Using  N  =  Ni  +  N2,  W  can  be  transformed  into 
a  function  of  N\\  ('F(iVi)  =  N\{N  —  N\).  To  find  the  optimal  topology,  the  derivative 
ofW(Ni)  with  respect  to  N\  has  to  be  determined:  =  N  —  2iVj.  A  maximum 

is  found  and  this  gives  the  optimum: 

N2  N 

W=  —  &tNl  =  N2  =  j. 


Since  the  number  of  neurons  and  the  number  of  weights  are  integral  numbers,  W  = 
and  N2  can  be  choosen  [4pJ  or  [y] . 

The  three  layer  system  L  =  3  and  W  =  N2(N1  +N3)  gives  analogously:  W ( N\ ,  N2)  = 
W(N2,N3)  =  N2(N  -  N2)  or  W(NuNa)  =  NX(N  -  Ni  -  2 N3)  +  N3(N  -  N3).  Max¬ 
imization  gives  a  maximum  at  N2  =  y  and  N3  =  y  —  jVi.  The  maximum  for  W  is 

again  [*£|. 

For  more  than  three  layers,  the  outcome  of  the  maximization  procedure  is:  drop 
all  but  two  or  three  layers,  and  the  same  maximum  holds;  in  other  words  multi-layer 
systems  (>3)  layers  are  not  optimal.  This  outcome  coincides  with  the  neural  network 
interpretation  of  Kolmogorov’s  theorem,  which  states  that  the  capabilities  of  a  neural 
network  with  more  than  three  layers  does  not  exceed  the  capabilities  of  a  three  layer 
neural  network  with  2N\  -f 1  neurons  in  the  hidden  layer  and  only  interlayer  connections 
[2]- 

For  neural  networks  with  interlayer  plus  intralayer  connections,  a  fully  connected 
two  layer  network  is  equal  to  a  two  layer  plenary  neural  network.  It  has  W  = 
Wi+JLiM.\±!h±.\l  _  +  1)  weights.  So  there  is  no  absolute  maximum;  any  distri¬ 

bution  of  the  neurons  over  the  two  layers  gives  this  ’’maximum”.  For  more  than  two 
layers  the  outcome  of  the  optimization  is:  drop  all  but  one  or  two  layers  and  distribute 
the  neurons  over  these  layers.  The  maximum  W  is  therefore  the  same  as  for  the  plenary 
neural  network. 

Layered  neural  networks  with  interlayer  and  supralayer  connections  have  a  different 
optimum:  Since  two  layer  neural  networks  do  not  have  supralayer  connections,  the 
smallest  networks  to  study  here  are  three  layer  networks:  L  =  3  and  W  =  N\N2  + 
N\ N3  +  N2N3.  W  can  be  written  as  a  function  of  two  variables  again:  W(Ni,N2)  = 
Ni(N  N2)  +  N2(N  -  N3),  W(Ni,Na)  =  N3(N  -  Nt  -  N3)  +  Ni(N  -  Nx),  or 

W(N2,  N3)  =  N2(N  —  N2  —  N3)  +  N3(N  —  N3).  Maximization  gives  a  maximum  at 
N\  —  N2  —  N3  —  y-  The  maximum  for  W  is  This  can  be  generalized  and  proven 
for  any  number  of  layers.  The  maximum  is  found  at  Ni  =  j-,  for  L  >  2  and  1  <  I  <  L, 
and  the  maximum  is 

2  L 


Thus  in  the  case  of  both  interlayer  and  superlayer  connections:  Since  the  number  of 
neurons  is  a  positive  integer,  each  layer  gets  at  least  [^J  neurons,  and  the  rest  of  the 
neurons  (N  —  L  [^-J)  can  be  distributed  over  the  layers.  The  number  of  weights  is  also 
a  positive  integer.  The  floor  function  can  only  be  applied  for  neural  networks  with  less 
than  eight  layers,  since  the  maximum  deviation  between  the  optimal  and  the  actual 
number  of  weights  can  be  as  large  as  j  ‘weights’. 


Conclusions 

The  optimal  topology  and  maximum  number  of  connections  for  all  the  interconnec¬ 
tion  schemes  are  given  in  this  table: 


interconnection 

optimal  topology 

structure 

^min 

N,'s 

Wmar 

inter 

2 

3 

n 

£ 

4 

inter  k  intra 

l 

2 

any  distribution 

N(N+ 1) 

2 

inter  k  supra 

3 

no  max. 

< 

II 

N* 

Esai 

plenary 

1 

no  max. 

any  distribution 

N(N+ 1) 
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The  unlimited  operating  rate  of  parallel  processing  sys¬ 
tems  as  suggested  by  various  proposed  architectures  is  ques¬ 
tionable.  Like  the  limitations  imposed  by  the  Von  Neuman 
bottleneck  on  serial  processing  it  appears  that  we  also  have  a 
fundamental  limitation  on  the  possible  ultimate  speed  of 
parallel  processors. 

Accepting  the  fact  that  the  universal  speed  limit  is  the 
velocity  of  light,  we  may  estimate  the  time  for  performing  a 
single  processing  step  on  a  planar  array  of  n  X  n  signal 
elements.  The  two  planes  P,  and  P2  in  Fig.  1  represent 
optical  elements  (transparencies,  lenses,  holograms,  spatial 
light  modulators,  acoustooptic  cells,  etc.).  To  perform  any 
processing  operation  light  must  be  propagated  between 
these  two  planes  to  interconnect  all  the  signal  elements  of 
one  plane  with  those  of  the  second  plane.  These  intercon¬ 
nections  may  be  achieved  by  waveguides  or,  globally,  by 
diffraction.  In  any  case  the  processing  time  will  be  limited 
by  the  transit  time  of  light  between  these  two  planes.  Fur¬ 
thermore,  there  will  also  be  a  skew — a  differential  time  delay 
between  the  various  interconnection  paths  (for  example, 
paths  1  and  2  in  the  figure). 

To  estimate  the  time  delays  involved  in  the  processing  we 
consider  free  space  propagation,  denote  the  distance  be¬ 
tween  the  two  planes  by  R,  and  denote  the  operating  aperture 
diameter  by  D.  Assuming  diffraction-limited  resolution, 
the  diameter  of  each  signal  element  (pixel)  may  be  given  by 
the  diffraction-limited  spot  size, 

a  -  2A4KR/D  -  2.44A(//No.),  (1) 

where  (//No.)  is  the  //number  of  the  optical  system  and  A  is 
the  illuminating  wavelength.  Thus  for  our  n  X  n  array  we 
shall  need  an  aperture  size, 

D  *  na  “  2.44nA(//N o.).  (2) 

The  maximum  delay  time  is  induced  on  a  diagonal  trajectory, 

Tm„^-(R2  +  D2)1'2,  (3) 

c 

where  c  is  the  velocity  of  light,  while  the  minimal  time  is 

T'min  *  S/C.  (4) 

These  two  time  delays  may  be  expressed  by  the  //No.  using 
Eqs.  (1)  and  (2): 

Tm„  -  —  n(//No.)[l  +  (//No.)2]1'2.  (5) 

V 

Tm,n  ■  —  n(//No.)‘,  (6) 

V 


Fig.  1.  General  building  block  of  an  optical  system:  two  planes 
with  separation  R  and  effective  aperture  D  interconnected  by  propa¬ 
gating  light.  The  two  rays  represent  maximal  and  minimal  path 
length  in  a  diffractive  interconnection. 

where  k  is  the  frequency  of  the  illuminating  light.  From  the 
last  two  equations  one  may  also  derive  the  skew: 

£  ■  Tmn  -  Tmln  «  —  n(//No.):l|l  +  (//No. r2]1  2  -  1|.  1 7 > 

V 

To  get  an  idea  about  the  magnitude  of  these  time  delays  let 
us  assume  visible  illumination  with  a  frequency  of  5  X  1014 
Hz,  an  //No.  of  2,  and  an  array  of  the  order  of  a  TV  frame  with 
n  =  500.  Substitution  of  these  numbers  results  in  Tmm  =  10 
ps  and  t  =  1.2  ps.  In  an  actual  system  involving  a  number  of 
processing  planes  and  possibly  waveguides,  these  numbers 
may  have  to  be  multiplied  by  an  appreciable  factor.  For 
example,  a  simple  optical  correlator  (4/ system)  has  a  factor 
of  4  leading  to  a  differential  delay  of  4.8  ps  with  a  total  delay 
of  40  ps. 

The  above  time  delays  are  quite  small  compared  to  current 
processing  facilities  and  presently  available  device  respons¬ 
es.  However,  considering  proposed  operation  with  femtose¬ 
cond  pulses  these  delays  may  become  the  ultimate  limiting 
factors.  The  overall  time  delay  must  not  concern  pipelined 
systems  but  it  may  become  quite  important  in  complex  ar¬ 
chitectures  such  as  those  involving  feedback  loop  operations. 
Equation  (7)  indicates  that  the  overall  time  delay  increases 
with  increasing  //No.  while  the  skew  approaches  the  limit 
\.22n/v.  Thus  these  effects  should  be  taken  into  consider¬ 
ation  for  very  high  speed  architectures.  For  example,  by 
using  optical  fibers  or  other  guiding  elements  one  may  solve 
the  skew  problem  but  by  doing  this  the  pipelining  delays  will 
be  increased. 

In  conclusion,  we  note  that  the  limiting  time  factors  were 
estimated  for  thin  optical  elements  in  free  space.  The  fact 
that  the  vacuum  velocity  of  light  is  a  universal  speed  limit 
may  indicate  that  we  are  dealing  here  with  a  universal  bottle¬ 
neck  influencing  all  possible  approaches  to  parallel  signal 
processing.  This  bottleneck  is  proportional  to  the  operating 
wavelength  suggesting  that  computing  with  visible  or  IR 
light  may  be  just  an  intermediary  step  toward  an  even  more 
advanced  approach. 
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I.  OBJECTIVE 

Hie  objective  of  this  research  is  to  advance  the  performance 
characteristics  and  applications  of  occpact  integrated  acousto-optic  and 
aoousto-electro-optic  Bragg  modulator  modules.  The  following  specific 
research  task  were  preposed  and  pursued: 

1.  Analysis  on  Acousto-optic  Bragg  Diffraction  in  Channel-Planar 
Waveguide 

2.  Identification  of  Existing  and  New  Architectures  and  Algorithms 

3.  Comparison  between  Acousto-optic  and  Electro-optic  Modulation/ 
Multiplication  Schemes 

During  the  course  of  this  research  significant  progress  was  made  in  each 
task. 


II.  ACOCMPT TSHMF7/PS 

A  summary  of  accomplishments  on  each  task  new  follows: 

1.  Analyses  On  Accusto-cptic  Braqcr  Diffraction  In  Channel-Planar  wavpouirte 

Fig.  1  shows  the  configuration  of  the  integrated  acousto-optic  (AD) 
Bragg  modulator  module(l]  that  has  been  analyzed.  An  array  of  light  beams 
ocupled  into  the  channel-waveguide  array  at  the  input  endface  of  the 
LiNb03  channel-planar  composite  waveguide  are  expanded  and  collimated  by 
the  titanium- indi f fused  proton-exchanged  (TIFE)  waveguide  lens  array [2] 
before  incidence  upon  the  surface  acoustic  waves  (SAW)  generated  by  the 
interdigital  SAW  transducer.  The  array  of  Bragg-diffracted  light  beams 
are  then  collected  and  focused  upon  the  output  endface  of  the  oanposite 
waveguide  by  the  large-aperture  TTPE  lens.  By  varying  the  carrier 
frequency  of  the  rf  driving  signal  the  Bragg-diffracted  light  beams  are 
scanned  along  the  output  endface. 

At  the  cutset  a  potential  distinction  between  the  AD  interaction 
geometry  under  consideration  and  the  conventional  one  that  involves  a 
single  SAW  and  a  single  light  beam  in  a  purely  planar  waveguide 
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substrate[3]  was  identified.  This  potential  distinction  was  based  on  the 
fact  that  optical  anisotropy  and  the  very  small  aperture  of  the  multiple 
incident  light  beams  (for  example,  5  to  10 /Urn)  and  thus  the  resulting 
spreading  of  the  light  beams  (by  diffraction)  will  significantly  affect 
the  performance  characteristics  of  the  device  module.  However,  a 
subsequent  numerical  calculation  shews  that  to  a  good  approximation  the 
spreading  angle  can  be  determined  using  a  conventional  formula  and  since 
the  microlens  array  is  placed  at  a  short  distance  from  the  cutput  edges  of 
the  channel-waveguide  array,  no  significant  effect  through  optical 
anisotropy  has  been  concluded.  Accordingly,  it  has  been  concluded  that 
the  ultimate  performance  characteristics  of  the  integrated  AD  Bragg 
modulator  module  such  as  diffraction  efficiency,  rf  bandwidth,  rf  drive 
power,  nonlinearity,  and  dynamic  range  are  practically  identical  to  that 
of  a  conventional  AD  Bragg  modulator  in  a  planar  waveguide[3] . 

2.  Identification  Of  Existing  And  New  Architectures  And  Algorithms 

The  integrated  AD  Bragg  modulator  module  of  Fig.  1  was  found  to  be 
rather  inconvenient  and  limited  in  applications  such  as  matrix-vector  and 
matrix-matrix  multiplications  as  one  set  of  input  data  must  be  used  co 
modulate  the  input  light  beams.  This  is  so  because  laser  arrays  (such  ~s 
A\rr\t»  igcg>r  arrays)  with  capability  for  independent  modulation  of  each 
individual  laser  are  not  ocranercially  available.  Consequently,  much 
efforts  were  made  to  identify  and  explore  other  new  architectures. 

The  two  device  architectures  that  have  been  identified  and  explored 
are  shown  in  Figs.  2  and  3.  The  basic  architecture  cannon  to  both  modules 
is  a  ccnposite  waveguide  in  which  a  channel-waveguide  array,  a  planar 
waveguide,  a  linear  TIPE  microlens  array,  Bragg  modulator  arrays,  and  a 
large— aperture  TIFE  lens  are  integrated  in  a  cannon  LiNbOj  substrate. 

The  channel  waveguide  array  (only  four  elements  are  shewn)  is  aligned  wth 
the  linear  microlens  array.  The  two  device  modules  presented  in  this 
report  utilizes,  respectively,  a  herringbone  Bragg  electrode  array  (Fig. 2) 
and  a  SAW  transducer  and  conventional  Bragg  electrode  array  oanbination 
(Fig.  3) .  The  microlens  array  was  used  to  capture,  expand,  and  collimate 
the  multiple  light  beams  from  the  channel-waveguide  array  before  their 
incidence  upon  the  resulting  electro-optic  (BO)  and  AO-EO  Bragg 


diffraction  gratings,  while  the  large-aperture  lens  collects  and  focuses 
the  multiple  Bragg-diffracted  light  beans  upon  a  photodetector.  In 
operation,  ’’multiplication"  of  data  is  carried  out  by  the  Bragg 
modulators,  while  "addition"  of  the  resulting  products  by  the 
large-aperture  integrating  lens. 

Since  this  particular  program  had  not  provided  any  funds  for 
fabrication  and  testing  of  the  two  device  modules,  actual  design, 
fabrication  and  testing  were  subsequently  carried  out  through  other 
programs.  Seme  of  the  experimental  results  have  been  published [4, 5] . 

In  suninary,  the  two  device  architectures  identified  and  explored  have 
been  shown  to  be  capable  of  realization  of  high-packing  density 
multichannel  integrated  optic  modules  with  applications  to  data  processing 
and  computing  including  prograranable  correlation  of  binary  sequences [6] . 


3.  Comparison  Between  Accusto-cotic  And  Electro-optic  Modulation/ 

Mult  ini  i  rati  on  Schemes 

A.  Acousto-optic  Modulation/Multiplication  Scheme 

Efficient  and  wideband  AD  Bragg  diffraction  by  the  SAW  was 
achieved  in  the  integrated  AD- ED  modules.  In  contrast  to  their  EO 
ocunterparts  these  integrated  AD  modules  have  the  unique  capability  to 
input  the  data  in  a  pipeline  fasicn  via  the  SAW.  Since  the  number  of 
operations  per  second  increases  with  the  number  of  input  light  beams  it  is 
desirable  to  design  and  fabricate  large  arrays  of  channel  waveguides  and 
microlenses  with  as  small  an  aperture  as  possible.  Using  60 xm  as  the 
aperture  of  the  linear  microlens  array  the  possible  number  of  the  light 
channels  will  be  as  large  as  333  for  a  SAW  propagating  path  of  2.0  cm. 
Since  the  velocity  of  a  Z -propagating  SAW  in  Y-cut  LlNbO-j  is  3.5xl05 
an/sec  the  corresponding  flew  rate  for  the  data  is  approximately  60  MHz. 
Naturally,  if  the  aperture  of  each  microlens  element  is  reduced  to  30 jxxa 
both  the  number  of  light  channels  and  the  data  flow  rate  will  be  increased 
by  a  factor  of  two.  A  specific  application  of  the  10  module  to  optical 
systolic  array  processing  and  ccrputing[7] ,  namely,  matrix-vector 
multiplication  was  successfully  carried  out. 
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B.  Electro-optic  Modulation/Multiplication  Scheme 

As  shown  in  Fig.  2,  the  integrated  BO  Bragg  modulator  module 
results  by  replacing  the  SAW-generated  AD  diffraction  grating  with  an 
array  of  BO  Bragg  diffraction  gratings  that  were  created  by  applying 
voltages  across  an  array  of  interdigital  finger  electrodes.  Efficient  and 
wideband  Bragg  diffraction  have  been  achieved  using  the  electrode  arrays 
with  13  jjm  periodicity  and  2.0  ran  aperture.  Specifically,  95%  diffraction 
at  a  drive  voltage  of  6.0  volt  and  870  MHz  rf  bandwidth  were  measured[4] . 
It  is  important  to  note  that  the  two  separate  electrode  arrays  of  the 
Herringbone  type  facilitate  application  and  thus  multiplication  of  two 
independent  sets  of  data.  Thus,  in  contrast  to  their  AD  counterparts, 
these  integrated  BO  modules  can  accept  multiple  sets  of  data  as  well  as  at 
a  much  higher  rate  than  is  possible  with  the  SAW.  This  capability  has 
been  utilized  to  perform  matrix-matrix  multiplication^] . 
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1.  INTRODUCTION 


It  is  generally  agreed  that  in  the  realm  of  computational  linear  algebra,  particularly 
the  multiplication  of  two  matrices,  optical  computing  has  an  inherent  speed  of  execution 
advantage  over  digital  electronics  (but  see  Section  4).  Investigators  in  optical  computing, 
have  generally  taken  matrix  multiplication  algorithms  directly  from  the  mathematical  liter¬ 
ature  and  modified  them  for  use  in  optical  computing,  some  representative  papers  are  [1-4]. 
Alternately  optical  architectures  have  been  developed  to  carry  out  such  computations,  e.g. 

[5-11). 

One  purpose  of  the  present  communication  is  to  describe  our  polynomial  convolution 

algorithm  which  is  an  ab  initio  development  of  matrix  multiplication  for  use  in  optical  com- 

SO 

puting.  A  second  purpose  is  to  consider  the  situation  where  the  matrices  areAlarge  that 
they  cannot  be  stored  simultaneously  on  optical  masks  (hereafter  termed  the  storage  prob¬ 
lem).  As  we  will  show  in  Section  4,  the  speed  advantage  of  the  methods  advocated  in  [  1—4 j 
are  compromised  because  the  matrix  elements  are  not  equally  accessible.  Furthermore,  we 
make  plausible  that  the  polynomial  convolution  algorithm  is  robust  with  respect  to  this 
debilitating  situation  in  that  it  is  still  possible  to  obtain  a  reasonable  concurrency  over  the 
more  classical  algorithms  because  of  the  simplified  bookkeeping  and  modular  structure  of 
the  convolution  algorithm. 
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2.  POLYNOMIAL  CONVOLUTION  ALGORITHM 


In  view  of  the  initial  complexity  of  the  algorithm  we  proceed  in  three  stages.  In  the 
first  stage  we  give  the  explicit  expressions  and  verify  these  formulae  in  the  second  stage. 
Finally,  we  outline  a  construction  which  leads  to  the  various  formulas. 

We  begin  by  considering  the  matrix  product  C  =  AB  where  A  is  of  the  size 
ni  x  n2,  B  is  of  size  n2  x  n 3,  and  C  is  of  size  n t  x  ri3,  with  corresponding  matrix 
elements:  a^y,  6yjt,  and  c,*.  Let  1  be  an  indeterminate,  and  associate  with  A  and  B 
the  polynomials  P(x)  and  Q{x) 

{n,  -  l)n2fi3-f-n3  -  1 

^(x)  =  22  PaX •  l2'1) 

j=0 

n 2  ^3  l 

q(x)  =  22  •  (2-2) 

t=0 

Note  that  the  degree  of  P(x)  is  (nj  -  l)n2n3  +  n2  -  1  which  involves  not  only  the  size 
of  A  through  n\  and  ri2  but  also  the  size  of  B  through  n 3.  The  degree  of  Q(x)  is 
n2n3  —  1  and  only  involves  the  size  of  B,  namely  n2  and  n3.  The  p  and  q  coefficients 
are  related  to  the  matrix  elements  of  A  and  B  by 


pa  =  a,y ,  if  s  -  (1  -  l)n2n3  +  j  -  1 


=  0,  if  (1  -  l)n2n3  +  n2  <  s  <  m2n3 


and 


(2.3a) 

(2.36) 


q t  =  if  t  =  kn2  -  j 

=  0,  if  t  >  n2n3 
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(2.4a) 

(2.46) 


with:  1  <  »  <  ni,  1  <  J  <  n2  and  1  <  A:  <  n3. 

We  claim  that  the  elements  of  the  matrix  product  C  are  given  by  selected  coefficients 
of  the  polynomial 


R(x)  =  P(x)Q(x) 

n  i  7i2  ^3  —  1 

=  rmx  (2.5) 

m=0 


where 


rm  =  y  ;  P»Qm  —  »  (2.6) 

3  =  0 

is  the  discrete  convolution  of  the  p  and  q  coefficients.  These  selected  rm  are  given  by 


rm  =  ctk  ,  if  m  =  ( i  -  l)rc2n3  -+-  kn2  -  1  .  (2.7) 

A  formal  proof  (which  is  really  a  verification  of  the  formulae)  is  now  given.  We  begin 
by  rewriting.  Eq  (2.6)  in  the  form 

rtn  —  y  ^  PsQm  —  a  —  y  „  &ijbjlc  (2.8) 

•  <*,0,7,6 

where  the  summation  in  the  second  series  is  over: 


a  :  s  =  (*  -  l)n2n3  -t-  j  —  1  '  (2.9 a) 

0  :  (»  -  l)n2n3  <  s  <  ( t  -  l)n2n3  +  n2  (2.96) 

7  :  t  ms-  kn2  -  j  (2.9c) 

6  :  t  <  n2»3  .  (2.9d) 
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The  a  term  is  simply  Eq.  (2.3a),  while  the  p  term  is  the  negation  of  Eq.  (2.3b).  The 
Tf  term  follows  from  Eq.  (2.4a),  while  the  S  term  is  the  negation  of  Eq.  (2.4b).  Upon 
substitution  of  the  a  term  into  the  P  inequality  we  immediately  see  that  this  can  only 
be  true 

1  <]<n2  .  (2.10) 

In  like  fashion,  substitution  of  the  -7  term  into  the  6  inequality  leads  to  the  requirement 
that 

m  =  (t  —  l)n2n3  +  kn2  —  1  (2-11) 

which  is  Eq.  (2.7).  Thus  the  formulae  are  verified. 

A  construction  which  leads  to  the  various  formulae  for  p,  and  qt  in  terms  of  aX] 
and  6jfc,  respectively  uses  row  vectors.  Consider  a  row  vector  p  whose  elements  we 
denote  by  p,  (coefficients  of  the  polynomial  P(x))  composed  of  the  matrix  elements  at} 
of  A  and  strings  of  zeros  as  depicted  in  Fig.  1A.  The  range  of  3  is 

0  <  s  <  n[ri2n3  -  n2n3  +•  n2  -  1  (2.13a) 

consequently 

P,  =  0  ,  if  s  >  (n1  -  l)n2n3  +  n2  (2.13a) 

=  0  ,  if  s  <  m2n3  .  (2.136) 

Furthermore  the  ps  are  related  to  the  a,;  as  given  by  Eq.  (2.3a),  as  the  reader  can 
verify  by  construction. 
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In  like  fashion,  we  construct  another  row  vector  q  with  elements  qt  according  to 
Fig.  IB.  Unlike  p,  q  has  no  strings  of  zero  elements.  The  range  of  t  is 

0  <  t  <  n2n3  -  1  (2.14) 

so  that 

qt  =  0  ,  if  t  >  n2n3  .  (2-15) 

Within  the  range  of  t,  the  qt  are  related  to  the  b3k  by 

Qt  =  *>}k  ,  if  t  =  [k-  l)n2  +  n2  -  j  (2.16) 

which  reduces  to  Eq.  (2.4a). 

As  an  illustrative  example  of  the  algorithm,  consider  the  case  where  A  is  2x2,  B 
is  2x3  so  that  C  is  2x3  (i.e.,  nj  =2,  n2  =  2,  n3  =  3).  The  upper  limits  on 
the  polynomials  P,  Q  and  R  are  7,  5,  and  11,  respectively.  The  p#,  qt  and  rm 
coefficients  evaluated  according  to  Eqs.  (2.3),  (2.4)  and  (2.7)  are  listed  in  Table  1.  Upon 
carrying  out  the  convolution  operation,  Eq.  (2.6),  in  conjuration  with  this  table  we  have: 

ri  =  Cu  =  PoQi  +  PiQo  =  Qub  li  +  <*12^21  (2.17a) 

r 3  —  c  12  =  P0Q3  t-  PiQ2  =  ati^l2  +  0,12^22  (2.11b) 

r$  —  C|3  P0Q5  +•  PiQa  =  1^13  +  O  i  2^23  (2.17c) 
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r7  —  —  P( j<?l  +  P790  —  +  022^21 


(2.17d) 


r9  —  c22  —  Pc9a  +  P7<?2  —  q21&12  +  <*22^*22  (2.17c) 

rll  =  c23  =  PgQS  +  P7<?4  =  <121^13  +  <*22^23  (2.17/) 

These  are,  of  course,  the  matrix  elements  as  obtained  by  more  standard  procedures. 

This  completes  our  description  of  the  algorithm. 
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3.  IMPLEMENTATION  AND  PARALLELISM  OF  ALGORITHM 

In  spite  of  the  complicated  looking  nature  of  the  algorithm,  its  implementation  in 
optical  computing  can  be  carried  out  in  straightforward  fashion. 

Examination  of  Fig.  1A  shows  that  the  matrix  elements  a,;  of  A  coded  into  the 
vector  p  consists  of  the  rows  of  A  in  which  strings  of  zeros  are  interspaced.  Thus  all 
we  need  to  do  to  handle  A  in  this  algorithm  is  to  store  it  on  an  optical  mask  according 
to  Fig.  1A.  The  vector  q  containing  the  matrix  elements  6;*  is  simply  the  columns  of 
B  in  reverse  order ,  see  Fig.  ID.  Obviously  we  need  only  code  B  as  per  Fig.  IB  on  an 
optical  mask  for  this  aspect  of  the  implementation.  Given  that  both  these  operations  have 
been  carried  out  we  then  proceed  according  to  the  various  formulae  quoted  in  the  previous 
section. 

The  parallelism  of  the  algorithm  (assuming  that  all  the  matrix  elements  of  A  and 
B  can  be  stored  in  primary  storage)  manifests  itself  through  the  corresponding  p  and 
q  vectors.  This  is  best  seen  by  examination  of  Table  1;  the  first  two  components  cf  p 
(».e.,  an  and  a^)  can  then  be  combined  simultaneously  with  (621.  &n),  (f>22,  £>12)1 
and  (6^3,  613)  of  the  vector  q.  While  these  operations  are  being  carried  out,  the  last 
two  (nonzero)  elements  of  p  (i.e.,  021  and  022)  are  to  be  combined  with  (621,  f>u), 
(622,  1*12)  1  (f>23 1  613)-  Thus  we  are  able  to  carry  out  the  manipulations  leading  to  the  six 
matrix  elements  of  C  simultaneously.  The  general  case  of  two  rectangular  matrices  does 
not  require  any  detailed  comment.  Consequently,  the  polynomial  convolution  algorithm  is 
at  least  as  fast  as  the  methods  advocated  in  [2,  4]  under  the  assumed  conditions  of  equally 
accessible  matrix  elements. 
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4.  INFLUENCE  OF  STORAGE  PROBLEM  ON  ALGORITHM  PARALLELISM  IN 
MATRIX  MULTIPLICATION 

Although  the  issue  of  matrix  multiplication,  in  the  context  of  optical  computing, 
has  been  cast  as  one  of  speed  of  execution  of  manipulations,  this  is  only  one  aspect  of 
the  problem  as  we  will  now  see.  Realistic  signal  processing  requirements  demand  very 
large  matrices  in  order  to  achieve  the  resolutions  necessary  to  fulfill  the  desired  goals. 
Because  such  large  matrices  are  needed  we  must  study  the  effect  of  storage  (that  is,  the 
extent  to  which  all  matrix  elements  in  the  two  matrices  under  multiplication  are  not 
equally  accessible)  on  the  inherent  parallelism,  and  hence  speed,  of  the  vaxious  algorithms 
proposed. 

When  the  matrices  are  small  (for  convenience  we  will  let  them  both  be  square  and  of 
size  n  x  n),  the  entire  arrays  containing  the  matrix  elements  of  A  and  B  cam  reside 
simultaneously  in  primary  storage  in  the  form  of  matrix  masks  as  described  in  Goodman 
[12],  then  it  is  possible  to  carry  out  all  of  the  manipulations  such  as  described  in  the 
algorithms  promulgated  in  [  1  — 4|.  Under  the  small  n  regime,  it  is  essentially  true  that 
all  matrix  elements  are  equally  accessible.  In  fact,  all  the  papers  that  we  have  succeeded 
in  locating  on  matrix  multiplication  (via  optical  computing)  tacitly  make  the  assumption 
that  all  matrix  elments  are  equally  accessible,  independent  of  n. 

Let  us  consider,  for  example,  the  inner,  intermediate,  and  outer  product  methods  for 
the  multiplication  of  matrices.  Reference  is  made  to  Appendix  A  for  the  development  of  an 
efficient  formalism  that  yields  these  representations.  Examination  of  these  representations 
reveals  that  it  is  possible  to  perform  the  matrix- matrix  product  at  two  levels  of  parallelism. 
At  the  first  level,  the  intermediate  product  methods  speed  up  the  execution  over  the 
inner  product  method  by  a  factor  of  n.  At  the  second  level,  the  outer  product  method 
achieves  a  factor  of  n2  over  the  inner  product  method.  In  fact,  there  are  n  parallel 
multiplications  and  (n  -  1)  parallel  additions  to  be  performed,  rather  than  the  n3 
sequential  multiplications  and  n3  -  n2)  sequential  additions  required  at  the  original 
element  level  algorithm.  Unfortunately  when  n  is  large,  the  entire  arrays  cannot  reside 
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in  primary  storage,  but  only  portions  thereof.  This  means  that  the  speed  advantage  of 
the  outer  product  method  is  now  lost  when  computing  large  matrices,  because  the  matrix 
elements  are  not  equally  accessible!  A  second  tacit  assumption  is  that  all  arithmetical 
operations  of  the  same  type  are  equivalent  both  in  cost  and  in  accuracy.  This  too  is 
violated  when  n  is  large. 

Thus  we  cannot  simply  dismiss  the  use  of  the  intermediate  product  representations 
when  n  is  large.  To  improve  the  efficiency  of  the  computation  in  this  situation,  it  is 
necessary  to  maximize  the  use  that  is  made  of  the  matrix  element  data  on  a  given  matrix 
mask  (containing  parts  of  A  or  D)  while  it  is  in  primary  storage.  It  is  probably 
advantageous  to  store  matrix  elements  by  columns.  This  is  precisely  what  the  column 
intermediate  representation  does:  Ce;  is  formed  as  a  linear  combination  of  Ae^  with 
combination  coefficient  drawn  from  Bey.  Obviously  one  can  choose  to  stow  rows  so  that 
the  row  intermediate  representations  is  appropriate.  In  this  scenario,  we  can  only  achieve 
a  factor  of  n  in  the  parallelism  in  order  to  accommodate  the  storage  problem.  There 
is  also  the  bookkeeping  question  as  to  efficient  storage  and  subsequent  manipulation  of 
the  matrix  elements  in  accordance  with  the  particular  algorithm  requirements.  Reference 
is  made  to  Hockney  and  Jesshope  ( 13 j  for  an  overview  of  such  considerations  in  digital 
electronic  computers. 

One  possible  solution  for  increasing  parallelism  when  n  is  large  via  partitioning.  The 
idea  is  certainly  not  new  as  witness  the  recent  paper  by  Caulfied  et  al.  [3]  who  choose  to 
use  2x2  matrices  for  the  partitioning.  Another  viable  approach,  using  the  formalism 
of  Appendix  A,  is  the  following.  Suppose  that  A,  B  and  C  are  partitioned  into 
submatrices.  This  means  that  the  partitioning  of  the  rows  of  A  and  those  of  C  is  the 
same,  that  the  partitioning  of  the  columns  of  B  and  those  of  C  is  the  same,  and  that  the 
partitioning  of  the  columns  of  A  and  of  the  rows  of  B  is  the  same.  The  matrix  product 
can  then  be  formed  blockwise.  The  foregoing  remains  valid  if  transcribed  by  replacing 
e,  by  E,,  etc.  is  the  t-th  block  column  of  the  appropriately  partitioned  identity 

matrix:  the  appropriate  partitioning  being  that  which  is  symmetric  with  respect  to  rows 
and  columns  for  the  matrix  multiplication  in  question.  Consequently,  we  recognize  AE; 
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as  the  j-th  block  column  of  A,  A  as  the  i-th  block  row  of  A,  and  Et+AE;  as  the 
(»,j)-th  block  element  of  A;  thus  we  have 


I  =  ^E4EJ  .  (4.1) 

k 

It  may  be  possible  to  store  large  matrices  in  partitioned  form,  with  the  natural  units  to 
be  stored  and  manipulated  being  the  submatrices  constituting  the  blocks. 

What  of  the  other  approaches  as  influenced  by  the  storage  problem?  The  reduction 
to  an  equivalent  matrix-vector  problem  advocated  by  Barakat  [4]  suffers  the  same  fate  as 
the  outer  product  representation  when  n  is  large  in  that  all  the  matrix  elements  are  not 
equally  accessible.  Reference  to  [4],  see  Eq.  (1),  shows  that  the  Roth  column  decomposition 
of  AB  contains  replicas  of  the  matrix  A  along  the  principal  diagonal;  so  that  in  this 
version  all  the  matrix  elements  cannot  be  held  in  primary  storage.  Thus  for  large  n,  the 
parallelism  inherent  in  the  general  reduction  to  the  Roth  column  decomposition  for  matrix- 
vector  multiplications  is  inhibited.  However,  there  is  also  a  Roth  row  decomposition  of 
AB,  see  Eq.  (4)  of  [4],  in  which  the  matrix  elements  of  A  are  now  spread  along  diagonals. 
It  was  hoped,  in  view  of  the  previous  work  by  Madsen,  et  al.  [I4j  on  matrix  multiplication 
by  diagonals,  that  the  storage  problem  could  be  circumvented.  A  detailed  analysis  which 
we  need  not  reproduce  indicates  that  the  row  decomposition  is  no  more  efficient  than  the 
column  decomposition  as  regards  the  primary  storage  of  matrix  elements. 

Finally  we  come  to  the  algorithm  of  the  present  paper.  The  implementation  of  the 
algorithm  as  discussed  in  Section  3  bears  directly  upon  the  storage  problem.  When  the 
matrices  are  large  enough  to  violate  the  equal  accessibility  condition,  we  can  still  maintain 
a  reduced  degree  of  parallelism  because  the  convolution  algorithm  does  not  require  the 
rather  complicated  bookkeeping  that  the  column  middle  product  decomposition  necessi¬ 
tates  before  calculations  can  be  carried  out.  Even  though  we  cannot  simultaneously  store 
ail  the  matrix  elements  of  A  and  B,  the  convolution  algorithm  only  requires  the  rows  of 
A  to  be  stored  on  separate  optical  masks  so  they  can  interact  with  the  successive  columns 


II 


(in  reverse  order)  of  B  sequentially  stand  on  optical  masks  to  produce  the  various  rows  ot 
C.  Consequently  when  both  A  and  B  are  large,  we  can  still  maintain  a  degree  of  par¬ 
allelism  because  we  do  not  require  all  the  matrix  elements  of  A  and  B  to  be  in  primary 
storage  simultaneously.  All  we  need  in  primary  storage  are  the  respective  row  and  column 
of  A  and  B.  Thus,  the  polynomial  convolution  algorithm  seems  to  be  more  immune 
to  the  storage  problem  than  do  the  algorithms  in  [2,4].  This  is  because  both  the  outer 
product  and  Kronecker  product  decomposition  algorithms  are  not  modular  in  structure: 
if  the  equal  accessibility  condition  is  violated  there  is  no  way  to  patch  them  up  to  work  in 
the  situation  where  the  matrices  are  very  large.  It  may  be  possible  to  employ  partitioning 
as  described  in  [3]  or  in  the  present  paper;  however,  the  bookkeeping  is  probably  going  r 
be  a  significant  obstacle. 
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APPENDIX  A 


The  purpose  of  this  appendix  is  to  outline  an  efficient  formalism  (due  to  our  colleague 
D.  G.  M.  Anderson,  unpublished)  describing  the  inner  product,  intermediate  product,  and 
outer  product  representations  of  matrix  multiplication.  We  further  employ  this  formalism 
to  discuss  matrix  partitioning,  see  Eq.  (-4.1). 

To  begin  we  avoid  unnecessary  complications  by  assuming  that  the  two  matrices,  call 
them  A  and  B,  are  square.  It  is  also  convenient  to  use  the  vector  e which  is  the  A:- th 
column  of  the  unit  matrix,  ». e., 

n 

k=  1 

and  the  plus  sign  denotes  the  transpose  (thus  is  a  row  vector).  Given  the  square 

matrix  A,  we  have 

j-th  colum  of  A  =  AOj 

»'th  row  of  A  =  e,+  A 

(»,j)-th  element  of  A  =  et+  Ae;. 

The  usual  element  representation  of  the  matrix  product  C  =  AB  reads  in  the  above 
notation 


e.+  Ce;  =  H  K+AeO(°fc  Be,  •  [A. 2) 

k 

The  element  representation  is  the  old  fashioned  way  that  matrices  were  multiplied  before 
high  level  programming  languages  were  invented. 

To  obtain  the  inner  product  representation ,  we  begin  with  the  element  representation, 
Eq.  (A. 2),  and  delete  the  parenthesis  on  the  right  hand  side,  thus 
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G>Cej  =  E  c+Acfce^Dey 


(«.*  A) 


E 


L  k 


(Bey) 


=  (e,+  A)(BeJ) 


(,1.3) 


The  reason  it  is  termed  the  inner  product  representation  is  that  the  matrices  A  and  B 
are  sandwiched  between  the  unit  vectors. 

At  the  other  extreme,  we  have  the  outer  product  representation  which  we  obtain  in 
the  following  fashion  from  the  element  representation,  Eq.  (A. 2): 


.+ 


CeJ  =  E  e.hAc*C*BC; 


E(Aefc)(e+B) 


L  k 


(.4,1) 


Consequently 


C  =  ^(Ae»)(eJn)  .  (-1.5) 

k 

The  reason  it  is  termed  the  outer  product  representation  is  that  the  matrices  A  and  B 
now  reside  at  the  extreme  left  and  right  of  the  summation.  This  expression  can  be  shown 
to  be  equivalent  to  the  expression  given  in  Athale  and  Collins  [2|,  see  their  Eq.  (2). 

We  next  consider  two  intermediate  representations  which  we  term  the  column  inter¬ 
mediate  product  representation  and  the  row  intermediate  product  representation.  We  return 
again  to  Eq.  (A. 2): 

etCcJ  =  E  ^AcfeCfcBoy  =  et+  (Aefc)(e^(Be;)l  (-1.0) 

k  k 
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or 


Cc>=X!(Ae»)W(Do,)|  . 


(■'■'I 


This  is  the  column  intermediate  product.  The  corresponding  row  intermediate  product  is 


K«,'A)efc;(efcfB)e; 


:.4.8) 


or 


<C  =  £  l(°*  A)efc|(o^  B) 


(.4.9) 


It  is  a  straightforward  exercise  to  extend  the  above  formalism  to  accommodate  rect¬ 
angular  matrices,  we  omit  the  details. 
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Table  I.  Listinq  of  the  p,  q  and  r  coefficients  for  the 


case  where  A  is  2*2,  Bis  2*3  and  C  is  2*3. 


Ps 

qt 

r 

m 

0 

ail 

1 

ai2 

cn 

2 

0 

b22 

3 

0 

b12 

C12 

•1 

0 

b23 

3 

0 

b13 

°13 

6 

d21 

7 

d22 

°21 

a 

0 

9 

0 

C22 

10 

0 

11 

0 

°23 
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0 


